-
Search architecture
last modified February 6, 2007 by ianb
This page describes architecture
Note: this is not planned for 1.0.The Problem
Right now we have ZCatalog-based searching of Plone content. This works okay for content that is in Plone, but does not work well for content not in Plone.An intermediate solution is to put more content into the ZCatalog, even if it isn't in Plone/Zope. ZCatalog can catalog content that is actually outside of Zope (using something like a stub object).
A longer term solution is to move the search out of Zope generally. ZCatalog is never going to create as relevant a set of search results as a well-tuned search engine. There's also some scalability concerns. Ultimately we'd like to be able to offer things like search services for blogs not hosted by us, for instance -- so that people can have consistent and comprehensive search results with a variety of services. Lastly, we should keep SEO (search engine optimization) in mind generally; having search systems of our own that are especially smart about our backend models leads us to ignore search optimizations that we should be paying attention to. Search results are key to our constituency; these are (primarily) public projects.
A Solution
We aren't going to do great work in search, this is obviously beyond our means and scope.Luckily other people are doing good work. Two projects stand out:
Lucene is the more mature project, and considered among the best open source search engines out there. Xapian also works well, and is easier to access from languages like Python.
You can access Lucene through pylucene. I don't think we'll want to do this -- pylucene is hard to build and adds weird constraints (like you have to use a special kind of thread). If we used this, we'd create a stand-alone service.
Another option is Solr. This is a usable search service built on Lucene with a APIs we can access (Lucene is just a library). Xapian comes with something similar called Omega, but it seems rather spotty. This post shows how to use Xapian to create a search service from Python, which might be more expedient if we used Xapian.
We would then need a ZCatalog implementation which was basically just a delegate to the search service. This would not replace all the catalogs in Zope, but just the ones we use for full text searching. Several implementations of this sort of setup exist (so I hear), but beyond that I don't know much.