Leveraging the Plone external indexing and searching story - Take
II
from
deo
on Apr 06, 2008 10:33 PM
Hello folks,
I finally got the necessary time to join and write down some
of the ideas floating on my head, so here it goes:
Leveraging the Plone external indexing and searching story
==========================================================
:Author: Dorneles Treméa
:Contact: dorneles@...
:Date: 2008-04-06
:Version: 0.1
.. contents::
Abstract
--------
The main goal of this proposal is to improve the default Plone
indexing and searching functionalities of Plone (which currently
are highly tied to ``portal_catalog``) to allow us to use external
searching servers, like `Solr`_ and `Xapian`_ (just to cite two
known examples), while keeping the compatibility with the current
code base. We should be able to:
- make the indexing/searching more scalable
- make the indexed data easily accessible by other Plone sites
and/or even applications written in other languages (Java/C#/...)
- improve resources allocation: one shared cache, instead of one
cache per Zope instance
Detailed Description
--------------------
Motivation
**********
Plone's indexing and searching mechanisms are **highly** tied to
the ``portal_catalog`` implementation, which uses the ``ZODB`` as
the storage layer.
This has some drawbacks:
1) when a search is made using the ``portal_catalog``, the catalog
data objects needs to loaded into memory, this may cause some
existing active objects to be deactivated, that's one of the
reasons why it's recommended to mount the portal_catalog as a
separated ``ZODB`` file
2) to allow scalability you need to use ``ZEO`` and each Zope client
will keep it's own memory cache, which *wastes resources*
3) due its nature, the indexing process can be a **heavy/long**
operation, causing *read* and or *write* conflicts, this is a
problem that `PloneQueueCatalog`_ tried to address in the past
4) data being stored inside the ``ZODB`` makes it *harder* to be
shared with external applications, especially non-Python ones
In the last years some good search engine solutions started to appear
in the market. Two promising solutions are `Lucene`_ and `Xapian`_.
Plone should cope with these externals tools in a way to avoid the
drawbacks above. Not only with those two options, but also with
anything else that fulfill a set of requirements.
Focus Areas
***********
These particular requirements need to be defined and Plone needs to be
improved to allow the flexibility required. Some of the areas affected
by this proposal are:
- Search
- LiveSearch
- Advanced Search
- Nagigation Portlet
- Topics/SmartFolders
All of them need to be improved to use a central indexing/searching
mechanism, which in turn will be pluggable (and so, extensible).
There are already some initial projects (specially for `Solr`_, an
enterprise search server based on `Lucene`_) trying to partly address
the issues raised by this proposal:
- in the indexing area: `enfold.indexing`_/`collective.indexing`_ (and
more recently `z3c.indexing.dispatch`_), both are generic and are in
an advanced stage
- in the searching area: `enfold.solr`_/`collective.solr`_, both are
`Solr`_ specific and are in an intermediate stage
- in the integration area: `SolrIntegration`_, which is Solr specific
and is in an initial stage
Deliverables
************
A good number of interfaces need to be defined, lots of tests need
to be written and a fine grain integration work need to be done.
Goals
*****
The primary goal is to have a feature complete implementation, working
out-of-the-box both with the standard ``portal_catalog`` and also with
a `Solr`_ server. If time permits, I'll also work on the `Xapian`_
integration.
About Me
********
Hello! I'm Dorneles ``deo`` Treméa, a Brazilian guy living in
Garibaldi (at the extreme south of the country) with his lovely
wife and two wonderful daughters.
I've being in touch with Plone since before the 1.0 version was
released (yeah, at some point back in 2002...) so you probably know
me from the Plone mailing lists or IRC channels or even personally!
For the past 3 years I was working with the folks from Jarn (formerly
known as Plone Solutions) where I made great friends and had a lot of
fun working directly with Alexander Limi, Geir Baekholt, Helge Tesdal,
Stefan Holek, Florian Schulz, Denis Mishunov, Martijn Pieter and
Wichert Akkerman. In the end of 2007 I joined Enfold Systems, to help
Alan Runyan and his gang with the challenges of integrating Plone with
heterogeneous environments.
I also currently hold the Administrative Director position at the
`Brazilian Python Association (APyB)`_ and I'm the CEO of X3ng, one
of the pioneers Plone companies in Brazil.
Talking about the GSoC, I was one of the three original Plone mentors
in 2006, but unfortunately my student didn't completed successfully
his project. This year my post-graduation proposal was accepted by the
university and I decided to run as a student, so here I am... ;-)
It would be great to be mentored by anyone with interest in this
particular area, including the authors of all cited products. In
truth, I would like to be multi-mentored to make sure the results
match the expectations of the whole Plone Community!
.. _Lucene: http://lucene.apache.org/
.. _Xapian: http://www.xapian.org/
.. _Solr: http://lucene.apache.org/solr
.. _PloneQueueCatalog:
http://dev.plone.org/collective/browser/PloneQueueCatalog
.. _enfold.indexing:
https://svn.enfoldsystems.com/browse/public/enfold.solr/trunk/enfold.indexing
.. _collective.indexing:
http://dev.plone.org/collective/browser/collective.indexing
.. _z3c.indexing.dispatch: http://svn.zope.org/z3c.indexing.dispatch
.. _enfold.solr:
https://svn.enfoldsystems.com/browse/public/enfold.solr/trunk/enfold.solr
.. _collective.solr:
http://dev.plone.org/collective/browser/collective.solr
.. _SolrIntegration:
https://svn.enfoldsystems.com/browse/public/enfold.solr/trunk/SolrIntegration
.. _Brazilian Python Association (APyB):
http://associacao.pythonbrasil.org/
--
Dorneles Treméa
X3ng Web Technology
http://nosleepforyou.blogspot.com