Lupy is a full text search engine written in Python as a port of Jakarta Lucene search engine. The project is retired but for me is the one that:
- Has some normal documentation
- Is stable and finished
- Does not require external software
A description can be found
on old project page, and you can download Lupy from
a Gentoo mirror. In /examples you can find some nice example code. Here I'll show the basic for indexing and searching data originally stored in the database.
Place this code in a view:
# import
from lupy.indexer import Index
# we create index named "foobar", create True = overwrite existing
index = Index('foobar', create=True)
# get all data
pages = Page.objects.all()
for p in pages:
#index every page
index.index(text=p.text, __title=p.title, _slug=p.slug)
index.optimize()
For a model:
class Page(models.Model):
title = models.CharField(maxlength=255) # page real title (for title tag and h1 in templates)
slug = models.SlugField(maxlength=255, unique=True) # the wiki URL "title"
description = models.CharField(maxlength=255) # short description (meta description, some link generation)
text = models.TextField() # the page text
changes = models.CharField(maxlength=255) # description of changes, no blanks!
creation_date = models.DateTimeField(auto_now_add = True)
modification_date = models.DateTimeField(auto_now = True)
modification_user = models.CharField(maxlength=30)
modification_ip = models.CharField(maxlength=20, blank=True)
from lupy.indexer import Index
index = Index('foobar', create=False)
# search term - python
hits = index.find('python')
for h in hits:
# slug is a lupu search index field which we added when indexing.
print 'Found in ', h.get('slug')
I've added this code to the wiki "add page" view, after the data is validated and saved:
if settings.WIKI_SEARCH_WITH_LUPY:
from lupy.indexer import Index
from os.path import isdir
if isdir('diamandaSearchCache'):
index = Index('diamandaSearchCache', create=False)
else:
index = Index('diamandaSearchCache', create=True)
index.index(text=page_data['text'].decode("utf-8"), __title=page_data['title'].decode("utf-8"), __description=page_data['description'].decode("utf-8"), _slug=page_data['slug'])
index.optimize()
Where
diamandaSearchCache is the Lupy cache name,
"WIKI_SEARCH_WITH_LUPY" is a setting in settings.py which can be True/False indicating if we use Lupy or not. This code indexes added page.
Searching is more advanced - boolean OR search for each word in the phrase. The code is part of wiki search view:
if data.has_key('lupy'):
from lupy.index.term import Term
from lupy.search.indexsearcher import IndexSearcher
from lupy.search.term import TermQuery
from lupy.search.boolean import BooleanQuery
index = IndexSearcher('diamandaSearchCache')
query = data['string'].split(' ')
q = BooleanQuery()
# how many words i phrase ?
if len(query) > 1:
for a in query:
t = Term('text', a.decode("utf-8"))
tq = TermQuery(t)
q.add(tq, False, False)
# one word
else:
t = Term('text', query[0].decode("utf-8"))
tq = TermQuery(t)
q.add(tq, True, False)
hits = index.search(q)
pages = []
for h in hits:
pages.append({'title': h.get('title'),'description': h.get('description'),'slug': h.get('slug')})
# to get from best to worst order
pages.reverse()
return render_to_response('wiki/' + settings.ENGINE + '/search.html', {'pages': pages, 'lupy': lupy, 'string': data['string'], 'google': google, 'lupyuse': True, 'theme': settings.THEME, 'engine': settings.ENGINE})
Search form has a submit button called lupy:
<input type="submit" value="{% trans "boolean OR search" %}" name="lupy" />
The name indicates which search (LIKE, Google or Lupy) is used by the user. Results are showed in the template:
{% if lupyuse %}
{% for page in pages %}
<img src="/site_media/wiki/img/2.png" alt="" /> <a href="/wiki/page/{{ page.slug }}/">{{ page.title }}</a> - {{ page.description }}<br />
{% endfor %}
{% endif %}
- Added: 14.07.2008 by riklaunim