Xapit (pronounced “zap it”) is a high level interface for working with a Xapian database.
Note: This project is early in development and the API is subject to change.
If you haven’t already, first install Xapian and the Xapian Bindings for Ruby. wiki.github.com/ryanb/xapit/xapian-installation
To install as a Rails plugin, run this command.
script/plugin install git://github.com/ryanb/xapit.git
Or to install as a gem in Rails first add this to config/environment.rb.
config.gem 'xapit'
And then install the gem and run the generator.
sudo rake gems:install script/generate xapit
Important: only run the generator script on a gem install, not for the plugin.
Simply call “xapit” in the model and pass a block to define the indexed attributes.
class Article < ActiveRecord::Base xapit do |index| index.text :name, :content index.field :category_id index.facet :author_name, "Author" index.sortable :id, :category_id end end
First we index “name” and “content” attributes for full text searching. The “category_id” field is indexed for :conditions searching. The “author_name” is indexed as a facet with “Author” being the display name of the facet. See the facets section below for details. Finally the “id” and “category_id” attributes are indexed as sortable attributes so they can be included in the :order option in a search.
Because the indexing happens in Ruby these attributes do no have to be database columns. They can be simple Ruby methods. For example, the “author_name” attribute mentioned above can be defined like this.
def author_name author.name end
This way you can create a completely custom facet by simply defining your own method. Multiple facet options or field values per record are supported if you return an array.
def author_names authors.map(&:name) # => ["John", "Bob"] end
Finally, you can pass any find options to the xapit method to determine what gets indexed or improve performance with eager loading or a different batch size.
xapit(:batch_size => 100, :include => :author, :conditions => { :visible => true })
You can specify a :weight option to give a text attribute more importance. This will cause search terms matching that attribute to have a higher rank. The default weight is 1. Decimal (0.5) weight values are not supported.
index.text :name, :weight => 10
To perform the indexing, run the xapit:index rake task.
rake xapit:index
It can also be triggered through Ruby code using this command.
Xapit.remove_database Xapit.index_all
You may want to trigger this via a cron job on a recurring schedule (i.e. every day) to update the Xapian database. However it will only take effect after the Rails application is restarted because the Xapian database is stored in memory.
There are two projects in development to help improve this reindexing.
You can then perform a search on the model.
# perform a simple full text search @articles = Article.search("phone") # add pagination if you're using will_paginate @articles = Article.search("phone", :per_page => 10, :page => params[:page]) # search based on indexed fields @articles = Article.search("phone", :conditions => { :category_id => params[:category_id] }) # search for multiple negative conditions (doesn't match 3, 5, or 8) @articles = Article.search(:not_conditions => { :category_id => [3, 5, 8] }) # search for range of conditions by number @articles = Article.search(:conditions => { :released_at => 2.years.ago..Time.now }) # manually sort based on any number of indexed fields, sort defaults to most relevant @articles = Article.search("phone", :order => [:category_id, :id], :descending => true) # basic boolean matching is supported @articles = Article.search("phone OR fax NOT email")
You can also search all indexed models through Xapit.search.
# search all indexed models @records = Xapit.search("phone")
Simply iterate through the returned set to display the results.
<% for article in @articles %> <%= article.name %> <%= article.xapit_relevance %> <% end %>
The “xapit_relevance” holds a percentage (between 0 and 100) determining how relevant the given document is to the user’s search query.
If the searched term isn’t found, but it is similar to another term then it will show up as a spelling suggestion.
<% if @articles.spelling_suggestion %> Did you mean <%= link_to h(@articles.spelling_suggestion), :overwrite_params => { :keywords => @articles.spelling_suggestion } %>? <% end %>
Facets allow you to further filter the result set based on certain attributes.
<% for facet in @articles.facets %> <%= facet.name %> <% for option in facet.options %> <%= link_to option.name, :overwrite_params => { :facets => option } %> (<%= option.count %>) <% end %> <% end %>
The to_param method is defined on option to return an identifier which will be passed through the URL. Use this in the search.
Article.search("phone", :facets => params[:facets])
You can also list the applied facets along with a remove link.
<% for option in @articles.applied_facet_options %> <%=h option.name %> <%= link_to "remove", :overwrite_params => { :facets => option } %> <% end %>
When installing Xapit as a Rails plugin, an initializer file is automatically created to setup. It looks like this.
Xapit.setup(:database_path => "#{Rails.root}/db/xapiandb")
There are many other options you can pass into here. This is a more advanced configuration setting which changes the stemming language, disables spelling, and changes the indexer and parser to a classic variation. The classic ones use Xapian’s built-in term generator and query parser instead of the ones offered by Xapit.
Xapit.setup( :database_path => "#{Rails.root}/db/external/xapiandb", :spelling => false, :stemming => "german", :indexer => ClassicIndexer, :query_parser => ClassicQueryParser )
Adapters are used to support multiple ORMs since not everyone uses ActiveRecord. The right adapter is detected automatically so you should not have to do anything for popular ORMs. However if your ORM is not supported then it is very easy to make your own adapter. See AbstractAdapter class for details.
If you have found a bug to report or a feature to request, please add it to the GitHub issue tracker if it is not there already.
This project can be found on github at the following URL.
If you would like to contribute to this project, please fork the repository and send me a pull request.