Advanced fulltext searching with Solr
Check out the new site at https://rkblog.dev.
29 July 2008
Comments
Solr is a search server using Lucene to perform full text searching on indexed data. The most interesting part is that Solr uses a REST XML/JSON/HTTP API to add, modify, delete and search data so it's quite easy to use. However Solr and Lucene are written in Java so some Java servlet hosting is "required" :) If you need something bigger than for example Sphinx you can check this project out (Solr is used by Digg or sourceforge).
Search smarter with Apache Solr, Part 1: Essential features and the Solr schema
Search smarter with Apache Solr, Part 2: Solr for the enterprise
PDFs on Slideshare
A simple Python script for Solr would look like this:
from httplib import HTTPConnection
def add(item_id, title, description, text):
"""
Add a document to index
"""
DATA = '''<add><doc>
<field name="id">%d</field>
<field name="title">%s</field>
<field name="description">%s</field>
<field name="text">%s</field>
</doc></add>''' % (item_id, title, description, text)
con = HTTPConnection('0.0.0.0:8983')
con.putrequest('POST', '/solr/update/')
con.putheader('content-length', str(len(DATA)))
con.putheader('content-type', 'text/xml; charset=UTF-8')
con.endheaders()
con.send(DATA)
r = con.getresponse()
if str(r.status) == '200':
print r.read()
else:
print r.status
print r.read()
def commit():
"""
commit changes
"""
DATA = '<commit/>'
con = HTTPConnection('0.0.0.0:8983')
con.putrequest('POST', '/solr/update/')
con.putheader('content-length', str(len(DATA)))
con.putheader('content-type', 'text/xml; charset=UTF-8')
con.endheaders()
con.send(DATA)
r = con.getresponse()
if str(r.status) == '200':
print r.read()
else:
print r.status
print r.read()
def search(key):
"""
perform a search
"""
con = HTTPConnection('0.0.0.0:8983')
con.putrequest('GET', '/solr/select?q=title:%s&start=0&rows=10&fl=id,title,description' % key)
con.endheaders()
con.send('')
r = con.getresponse()
if str(r.status) == '200':
print r.read()
else:
print r.status
print r.read()
add(5, 'Django search engine', 'blaaaa', 'text')
add(6, 'Google search', 'blaaa', 'text')
add(7, 'Solr fulltext search engine', 'blaaaa', 'text')
add(8, 'Car engine', 'blaaaa', 'text')
commit()
search('searching')
<fields>
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="title" type="text" indexed="true" stored="true"/>
<field name="description" type="text" indexed="true" stored="true"/>
<field name="text" type="text" indexed="true" stored="true"/>
</fields>
<uniqueKey>id</uniqueKey>
RkBlog
Check out the new site at https://rkblog.dev.
Comment article