Advanced fulltext searching with Solr

Check out the new site at https://rkblog.dev.

29 July 2008 Comments

Solr is a search server using Lucene to perform full text searching on indexed data. The most interesting part is that Solr uses a REST XML/JSON/HTTP API to add, modify, delete and search data so it's quite easy to use. However Solr and Lucene are written in Java so some Java servlet hosting is "required" :) If you need something bigger than for example Sphinx you can check this project out (Solr is used by Digg or sourceforge).

Info

Read about Solr + PHP in Polish

Read about Solr + Python in Polish

Enterprise search with PHP and Apache Solr
Search smarter with Apache Solr, Part 1: Essential features and the Solr schema
Search smarter with Apache Solr, Part 2: Solr for the enterprise
PDFs on Slideshare
A simple Python script for Solr would look like this:

from httplib import HTTPConnection

def add(item_id, title, description, text):
	"""
	Add a document to index
	"""
	DATA = '''<add><doc>
<field name="id">%d</field>
<field name="title">%s</field>
<field name="description">%s</field>
<field name="text">%s</field>
</doc></add>''' % (item_id, title, description, text)
	con = HTTPConnection('0.0.0.0:8983')
	con.putrequest('POST', '/solr/update/')
	con.putheader('content-length', str(len(DATA)))
	con.putheader('content-type', 'text/xml; charset=UTF-8')
	con.endheaders()
	con.send(DATA)
	r = con.getresponse()
	if str(r.status) == '200':
		print r.read()
	else:
		print r.status
		print r.read()

def commit():
	"""
	commit changes
	"""
	DATA = '<commit/>'
	con = HTTPConnection('0.0.0.0:8983')
	con.putrequest('POST', '/solr/update/')
	con.putheader('content-length', str(len(DATA)))
	con.putheader('content-type', 'text/xml; charset=UTF-8')
	con.endheaders()
	con.send(DATA)
	r = con.getresponse()
	if str(r.status) == '200':
		print r.read()
	else:
		print r.status
		print r.read()

def search(key):
	"""
	perform a search
	"""
	con = HTTPConnection('0.0.0.0:8983')
	con.putrequest('GET', '/solr/select?q=title:%s&start=0&rows=10&fl=id,title,description' % key)
	con.endheaders()
	con.send('')
	r = con.getresponse()
	if str(r.status) == '200':
		print r.read()
	else:
		print r.status
		print r.read()

add(5, 'Django search engine', 'blaaaa', 'text')
add(6, 'Google search', 'blaaa', 'text')
add(7, 'Solr fulltext search engine', 'blaaaa', 'text')
add(8, 'Car engine', 'blaaaa', 'text')
commit()
search('searching')

For a simple schema like this:

<fields>
  <field name="id" type="string" indexed="true" stored="true" required="true" />
  <field name="title" type="text" indexed="true" stored="true"/>
  <field name="description" type="text" indexed="true" stored="true"/>
  <field name="text" type="text" indexed="true" stored="true"/>
  </fields>
  <uniqueKey>id</uniqueKey>

RkBlog

Django web framework tutorials, 29 July 2008

Check out the new site at https://rkblog.dev.