Archive

Posts Tagged ‘xapian’

HOWTO Build your own binaries of PHP Xapian bindings for Debian

July 6, 2011 9 comments

Due to a licensing issue, the PHP bindings for Xapian were removed from Debian Squeeze. See http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=513796 for more information about this.

Though, it is not really hard to build your own package from source.

Here is how:

sudo apt-get build-dep xapian-bindings
sudo apt-get install php5-dev php5-cli
apt-get source xapian-bindings
cd xapian-bindings-1.2.*
rm debian/control
env PHP_VERSIONS=5 debian/rules maint
debuild -e PHP_VERSIONS=5 -us -uc
cd ..
sudo dpkg -i php5-xapian_*.deb

Be careful: the extracted source directory (xapian-bindings-1.2.*) has to be absolutely clean, so if you tried a first time and it failed, remove the whole directory before trying again.

Of course, the same procedure holds true for Ubuntu as well.

2012-12-23 edit: if you’re using PHP 5.4, you need to modify the debuild line. Check the update article on Xapian’s wiki

To enable the extension, don’t forget to create a xapian.ini  (or 20-xapian.ini for the latest Ubuntu with PHP 5.4) containing “extension = xapian.so” in /etc/php5/apache2/conf.d/, then restart your web server.

Advertisements

Reunión conjunta PHP Perú y Drupal Perú, este Sábado 12/12/2009

December 10, 2009 Leave a comment

Este Sábado 12/12 a las 3pm, las comunidades de PHP Perú y Drupal Perú se reunirán en nuestras oficinas para hablar de Xapian, SimpleTest y el tema Elements de Drupal 6.

Más info en http://groups.drupal.org/node/35790

En la mañana, estaremos en el evento de Unacinux en el Callao: http://csl.unac.edu.pe/cronograma.php, para hablar de software libre exitoso en la educación peruana (Drupal, OpenERP, KnowledgeTree, PMB, OpenC2C, …)

Categories: eventos, Spanish Tags: , , ,

Xapian: the tricky multi-term removal process

March 29, 2009 4 comments

Update 2012-01-17: this article is quite old now and it might be completely irrelevant. It is only provided as a hint which might help you out writing a procedure in PHP to manage indexing. As Olly Betts (main developer of Xapian) commented below, the error message doesn’t come directly from Xapian either, but it might be coming from some of the things built on top. No harm is meant to Xapian, it is a very light weight solution, fits very well in our needs to have an indexing component in our PHP application without adding complicated Java requirements and its been working for us for several years now. No critical use, but it’s never been down either.

Chamilo now implements the Xapian search engine in its professional version. The results are quite good, but to implement a very specific need for one customer, we had to make something a bit complicated: we associated terms in the Xapian database to a specific table of terms in Chamilo.

Not playing too much with transactions (as we should, really), we’ve been relying on the process of keeping the two codes databases in synch by having code that only does the two things together each time.

Of course, one of our team was taken aback by a client request and decided to “clean some terms directly from the Chamilo database”… Murphy’s law’s applications are always around…

Anyway, I had to implement a little (very ugly for now) interface to add/remove/edit terms from the Xapian database without affecting the Chamilo database. That’s when I realized that, when you remove terms from a XapianDocument object, you have to do the following process:

$list_of_terms_to_remove = array(‘term1′,’term2′,’term3’);

$xi = new XapianIndexer();

$doc = $xi->get_document((int)$doc_id);

foreach ($list_of_terms_to_remove as $rem_term) {

$doc->remove_term($rem_term);

if ($doc instanceof XapianDocument) {
$xi->getDb()->replace_document((int)$doc_id, $doc);
}

}

$xi->getDb()->flush();

Now… it doesn’t look like it, but the replace_document() method is actually quite important. If you don’t put that one *in* the loop, then Xapian will give you an evil error saying a term cannot be removed from an unexisting document! You want to avoid that? Use replace_document(); It’s that easy.

A few things about CakePHP, PostgreSQL and Xapian

January 18, 2008 20 comments

While on part-time holidays from Dokeos, I’m looking into some of the stuff I will need to be aware of when working on Dokeos for the next year. I think you can call that technology watch.

Anyway, I’m looking into CakePHP, PostgreSQL and most of all Xapian (although I did not yet reach that point).

CakePHP is the closest thing to a “PHP on Rails” (or so I though at the time of writing this article – turns out there are much closest matches – see comments to this post). You can develop ugly but functional web applications really fast, then start working on the things that really matter in terms of graphics. I’ve barely done a few pages but I have read a lot. Although the next version (1.2), integrating l10n and i18n is only in beta at the time of writing, it does look promising and I bet in a year or so it might be quite easy to do a simple e-learning system with that. However, we are going our way in the same direction too, with the preparation of Dokeos 2.0, which will bring object-oriented code which will (hopefully) speed up our developments.

I need a file upload feature (to index text files) and a users and permissions management for a small application I’m working on (to prepare for indexation in Dokeos) and I have found enough websites to get me going… (see references below)

About PostgreSQL now, I just realised there is no known way of retrieving PostgreSQL arrays directly from PHP. You have to fetch them as strings and then “parse” the string to get an array. That sounds quite bad to me. Apparently, there is a very recent version of Perl’s DBI that allows that, but I haven’t been able to confirm that just yet. One solution might be to develop a stored procedure in PostgreSQL to do that, but I don’t really fancy the idea of delving into DBMS-specific internals (although arrays are one specific internal of PostgreSQL).

Xapian seems nice. I’m getting more precise in my questions on the Xapian mailing-list, because I know what I want and need now, but I haven’t been able to put it in practice just yet. That’s going to have to wait for next week, probably. Xapian is comparable in many ways to Lucene, a Java-based indexing and search engine. I’m working on this small app’ with my father, who’s spear-heading the Lucene alternative while I’m trying to get my hands deep into Xapian+PHP

The good thing in all this is that all these tools are easily (all things being equal) combined to provide a powerful application for document indexing, storage and retrieval. CakePHP has a fully-functional PostgreSQL mode (which apparently cannot be said about its ADODB mode). Xapian works well with PHP and has a Debian package (php5-xapian) which eases its integration. Let’s see where that leads us…

Exciting!

References

File upload with CakePHP (although not exactly what I’m looking for)

HasAndBelongsToMany relationship (what I’m looking for for the user-group relationship)

Xapian

December 9, 2007 Leave a comment

Xapian is one of the indexation/search engines I’m planning on integrating to Dokeos instead of the current indexation/search engine that we have at the moment: MnoGoSearch. There are many reasons for that change:

  • MnoGoSearch server is not GPL under Window, which makes it impossible for us to ship as part of Dokeos
  • MnoGoSearch has crappy support mailin-lists
  • MnoGoSearch’s packages in various distributions are not maintained
  • Xapian has *many* interfaces in many languages, all regularly maintained. The PHP interface also allows for indexation, which means we can do indexation on-the-fly, and do not have to rely on a cron-ed process to spider a website or a database later on
  • MnoGoSearch’s documentation is baed on Russian documentation and translated to Englih by Russians who, obviously, don’t master English enough to make a good documentation

Anyway, I read today that OLPC (One Laptop Per Child) XO is using Xapian internally. This doesn’t mean anything directly or in the interest of Dokeos, but it is interesting to learn that other big known projects are using it as well, plus here it’s got to do with education and it’s one project I’m interested in as well.

Searching and Indexing Engines

This article was first written in March 2005 for the BeezNest technical
website (http://glasnost.beeznest.org/articles/215).

Probably the best reference about Searching Engines: http://www.searchtools.com/tools/

Of special interest in the Open Source market:

Middlewares

DB

%d bloggers like this: