Since version about 3.5.9 SQLite contains full-text search module called FTS3 (older releases may have FTS1, FTS2). Using this module we can easily add fast full text search to a Django application (or any other using SQLite). During my tests I got FTS3 only on Python 2.6. On Python 2.5 and using pysqlite may be hard to get FTS3 (try recompiling with -DSQLITE_ENABLE_FTS3=1 flag).

Creating virtual table

Here is an example table:

CREATE VIRTUAL TABLE my_search using FTS3(slug, body);
WHERE FTS3(columns...) defines table structure. If you don't have FTS3 - this query will fail. Every column in FTS3 is Text, so you can easily add text data to it. In this example slug identify an entry in normal table (like unique id), and the body will contain text against which we will search.

FTS1 and FTS2 can be used exactly in the same way as FTS3. More on sqlite.org.

Adding data to a virtual table

We can start with importing data from existing table. With this script (executed from Django project folder) we can import the data:

# -*- coding: utf-8 -*-
import sys
import urllib2
from os import environ

environ['DJANGO_SETTINGS_MODULE'] = 'settings'

from settings import *
from django.contrib.sessions.models import *
from django.db import connection, transaction

from MYAPP.models import *


cursor = connection.cursor()

j = MY_SOME_MODEL.objects.all()
iterr = 1
for i in j:
	print iterr
	txt = i.some_txt + ' ' + i.more_txt + ' ' + i.city_of_something
	# txt should be stripped from HTML, stop words etc. to get smaller size of the database
	cursor.execute("INSERT INTO my_search (slug, body) VALUES (%s, %s)", (i.slug, txt))
	iterr += 1

transaction.commit_unless_managed()

Indexing new entries we can handle in Django with signals. For example in models.py add:

from django.db.models import signals

#....

def update_index(sender, instance, created, **kwargs):
		cursor = connection.cursor()
		
		txt = instance.some_txt + ' ' + instance.more_txt + ' ' + instance.city_of_something
		# txt should be stripped from HTML, stop words etc. to get smaller size of the database
		txt = clean_to_search(txt)
		if created:
			# add if object is created, not updated
			cursor.execute("INSERT INTO my_search (slug, body) VALUES (%s, %s)", (instance.slug, txt))
			transaction.commit_unless_managed()
signals.post_save.connect(update_index, sender=MY_SOME_MODEL)

Full-text in SQLite

The query looks like this:

SELECT slug FROM my_search WHERE body MATCH 'search term';
In Django we can execute such query like this:
cursor = connection.cursor()
cursor.execute("SELECT slug FROM my_search WHERE body MATCH %s", (query,))
results = cursor.fetchall()
When we have the slug we can display the search results (in this example - select entries from the main table using selected slugs).

I'm testing this search solution on my JobMaster - job offer searcher hosted on megiteam.pl. As for now no problems occurred with this (except switch from Python 2.5 to Python 2.6). It doesn't leak memory (Nginx+FastCGI), and it doesn't seem to be slow (it index entries faster than Whoosh, but no number on it). It's simple, easy to setup, and no whoosh. solr, xapian needed, so it's cool way to add full text search to SQLite powered websites.

blog comments powered by Disqus

Categories