yuqil / search-engine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a project for search-engine

  1. Built a text-based large scale search engine indexed using a pre-index corpus consisting of 10% of all Wikipedia webpages (Lucene API) on corpus of 500,000+ documents from ClueWeb09 dataset.
  2. Created parsers which would be able to handle structured queries consisting of operators like 'AND', 'OR', 'NEAR', 'WEIGHT', 'WINDOW' as well as handle Bag of Words(BoW) queries.
  3. Implemented retrieval algorithms including Ranked/Unranked Boolean retrieval method, BM25 retrieval method, and Indri retrieval method.
  4. Implemented query expansion based on pseudo relevance feedback and feature-based search based on SVM, which improve the precision by 20% on average.

About


Languages

Language:Java 99.9%Language:Makefile 0.1%