Solr is a search server using Lucene to perform full text searching on indexed data. The most interesting part is that Solr uses a REST XML/JSON/HTTP API to add, modify, delete and search data so it's quite easy to use. However Solr and Lucene are written in Java so some Java servlet hosting is "required" :) If you need something bigger than for example Sphinx you can check this project out (Solr is used by Digg or sourceforge).

Enterprise search with PHP and Apache Solr
Search smarter with Apache Solr, Part 1: Essential features and the Solr schema
Search smarter with Apache Solr, Part 2: Solr for the enterprise
PDFs on Slideshare
A simple Python script for Solr would look like this:
from httplib import HTTPConnection

def add(item_id, title, description, text):
	"""
	Add a document to index
	"""
	DATA = '''<add><doc>
<field name="id">%d</field>
<field name="title">%s</field>
<field name="description">%s</field>
<field name="text">%s</field>
</doc></add>''' % (item_id, title, description, text)
	con = HTTPConnection('0.0.0.0:8983')
	con.putrequest('POST', '/solr/update/')
	con.putheader('content-length', str(len(DATA)))
	con.putheader('content-type', 'text/xml; charset=UTF-8')
	con.endheaders()
	con.send(DATA)
	r = con.getresponse()
	if str(r.status) == '200':
		print r.read()
	else:
		print r.status
		print r.read()

def commit():
	"""
	commit changes
	"""
	DATA = '<commit/>'
	con = HTTPConnection('0.0.0.0:8983')
	con.putrequest('POST', '/solr/update/')
	con.putheader('content-length', str(len(DATA)))
	con.putheader('content-type', 'text/xml; charset=UTF-8')
	con.endheaders()
	con.send(DATA)
	r = con.getresponse()
	if str(r.status) == '200':
		print r.read()
	else:
		print r.status
		print r.read()

def search(key):
	"""
	perform a search
	"""
	con = HTTPConnection('0.0.0.0:8983')
	con.putrequest('GET', '/solr/select?q=title:%s&start=0&rows=10&fl=id,title,description' % key)
	con.endheaders()
	con.send('')
	r = con.getresponse()
	if str(r.status) == '200':
		print r.read()
	else:
		print r.status
		print r.read()

add(5, 'Django search engine', 'blaaaa', 'text')
add(6, 'Google search', 'blaaa', 'text')
add(7, 'Solr fulltext search engine', 'blaaaa', 'text')
add(8, 'Car engine', 'blaaaa', 'text')
commit()
search('searching')
For a simple schema like this:
<fields>
  <field name="id" type="string" indexed="true" stored="true" required="true" />
  <field name="title" type="text" indexed="true" stored="true"/>
  <field name="description" type="text" indexed="true" stored="true"/>
  <field name="text" type="text" indexed="true" stored="true"/>
  </fields>
  <uniqueKey>id</uniqueKey>

blog comments powered by Disqus

Categories