Preprocessing class provides methods for tokenizing and stemming of both queries and documents, as well as removing stop-words from them.
There's implementation of Vector Space Model with TF-IDF space and Language Model with 1-gram query likelyhood.
Note: VectorSpaceModel provides methods write_inverted_file read_inverted_file to write and read inverted file from memory, but our search machine appears to be really fast and efficient on chosen dataset so we are not using it.
About
Vector Space Model and Language Model for Information Retrieval system based on collections of text documents.