CodePeters / ranking-engine

A ranking engine for text search. Given a query and a set of articles, it returns N most relevant articles.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ranking-engine GPLv3 license

A ranking engine for text search. Given a query and a set of articles, it returns N most relevant articles.

Input

The input required is a file ("file.txt") with the articles in json format:

{"abstract":"The article text here!!!", "keywords":["keyword1", "keyword2.. etc"], "title":"The title here"}

It may have more fields which will be ignored, also if one of the fields above is missing algorithms ignores it in computations.

Execution

  • First run preprocess.py which does some preprocess to the articles and produces an output file.

  • Then run rank.py which takes as input the previous generated file and a query from standard input and using tf-idf it returns N most relevant articles. The ranking also gives diferrent weights to the article's abstract, keywords and title.

License

This project is licensed under the GPLv3 License - see the LICENSE file for details

About

A ranking engine for text search. Given a query and a set of articles, it returns N most relevant articles.

License:GNU General Public License v3.0


Languages

Language:Python 100.0%