Building an inverted index using Python, as a solution for Programming Assignment at the university.
- Python 2
- Data to index in HTML format (check data in docs/cacm.zip as an example)
-
Put data "data" directory
-
Run
python indexer.py
- Read generated output files
- documents.dat (Documents IDs)
- document name -> document ID
- index.dat (Inverted Index)
- term -> postings
- documents.dat (Documents IDs)