BlackSound1 / Reuters21578-naive-indexer

Implemented a naive indexer for Reuters21578. Implemented single-term query processing. Implmented and compared results of lossy dictionary compression

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reuters21578 Naive Indexer

Installation

Install the Reuters21578 corpus from http://www.daviddlewis.com/resources/testcollections/reuters21578/. Unzip it and save the folder to the same level as this project. Name the folder reuters21578.

Install all dependencies in requirements.txt.

Running

This project is split into three subprojects. Run them with $ python main.py.

Subproject 1

Creates a naive index out of the text of the Reuters21578 corpus.

Subproject 3

Reads the index created in subproject 1 and performs lossy compression techniques on its dictionary. Shows a table comparing the sizes of the indexes dictionary before and after various compression steps.

Subproject 2

Queries the index with several single-term queries.

About

Implemented a naive indexer for Reuters21578. Implemented single-term query processing. Implmented and compared results of lossy dictionary compression


Languages

Language:Python 100.0%