tanvi2612 / wiki-search

Information Retrieval and Extraction Miniproject to build a Wikipedia like Search engine using wiki dump

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Code Specifications


There are two files for indexing and searching respectively:

For indexing, the file is called indexer.py. To run this file you will have to run

python idexer.py $1 $2 $3

here $1 = The address to the dump file (.xml) , $2 is the location where the indexed files should be stroe, $3 is the name of the file that contains the stats.

The code will output on the terminal 2 lines:

  1. The number of files in the dump
  2. The total time taken to reate the index

In addition to this the code will also output 2 files in the $2 location

  1. tf.txt - This document contains the frequencies of words in the documents allong with the of

About

Information Retrieval and Extraction Miniproject to build a Wikipedia like Search engine using wiki dump


Languages

Language:Python 99.4%Language:Shell 0.6%