hamzakat / python-indexer

Building an inverted index using Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python Indexer

Building an inverted index using Python, as a solution for Programming Assignment at the university.

Requirements

  • Python 2
  • Data to index in HTML format (check data in docs/cacm.zip as an example)

Usage

  1. Put data "data" directory

  2. Run

python indexer.py
  1. Read generated output files
    • documents.dat (Documents IDs)
      • document name -> document ID
    • index.dat (Inverted Index)
      • term -> postings

About

Building an inverted index using Python


Languages

Language:HTML 98.3%Language:Python 1.7%