Search is currently the dominant information retrieval paradigm, and WordPress’ internal search functionality is one step removed from atrocious. With that in mind, I’d like to significantly improve how search works on the J-School’s WordPress network. These are the notes I’m putting together as a part of my planning process.
A search for my name currently looks something like this:
Ideally, the search functionality should support these requirements:
- Query across all of the content objects associated with the J-School’s primary website. These objects include posts, pages, events, blogs, databases, members, groups, and (coming soon) job opportunities. Eventually it would be nice to search attachments as well.
- Expand a query to include content from any of the 216 and counting websites within the network. Filter results to a specific site, or by author, publication date, categories, or tags.
- Highlight results based on matched keywords. If possible, show the sections of text matching the query.
- Log queries and (optionally) provide analytics on search trends.
As far as I can tell, the options on the table are Sphinx, Solr, and search as a service from IndexTank. Sphinx appears the lowest-hanging fruit; Solr takes a couple of weeks to set up and configure, and IndexTank costs money for anything over 500 queries/day.
Extending search sources to custom fields is apparently as simple as adding to the select query.
I intend to get Sphinx working on the development environment first, document the steps it took, then implement on production.