Researching better search functionality for the CUNY J-School network

Search is currently the dominant information retrieval paradigm, and WordPress’ internal search functionality is one step removed from atrocious. With that in mind, I’d like to significantly improve how search works on the J-School’s WordPress network. These are the notes I’m putting together as a part of my planning process.

A search for my name currently looks something like this:

Ideally, the search functionality should support these requirements:

  • Query across all of the content objects associated with the J-School’s primary website. These objects include posts, pages, events, blogs, databases, members, groups, and (coming soon) job opportunities. Eventually it would be nice to search attachments as well.
  • Expand a query to include content from any of the 216 and counting websites within the network. Filter results to a specific site, or by author, publication date, categories, or tags.
  • Highlight results based on matched keywords. If possible, show the sections of text matching the query.
  • Log queries and (optionally) provide analytics on search trends.

As far as I can tell, the options on the table are Sphinx, Solr, and search as a service from IndexTank. Sphinx appearsĀ the lowest-hanging fruit; Solr takes a couple of weeks to set up and configure, and IndexTank costs money for anything over 500 queries/day.

For Sphinx, there’s a WordPress plugin making it easier to integrate the two. The author has reasonably detailed documentation for installing Sphinx via the admin, if you chose to do that.

Another sys admin has written a three part series on extending WordPress search with Sphinx.

Extending search sources to custom fields is apparently as simple as adding to the select query.

The best way to dynamically add new blogs to the index for WordPress multisite is by editing the .conf file, although I’ll need to develop a way to add a unique index for every piece of content.

I intend to get Sphinx working on the development environment first, document the steps it took, then implement on production.