abi98213 / CS-250

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CS-250

A Search Engine implementation using xml files as input.
Structure of valid input xml.

<collection>
  <page>
    <id>some +ve integer</id>
    <title> title of the page </title>
    <text> contents of the page </text>
  </page>
  <page>....</page>
  <page>....</page>
  <page>....</page>
  .
  .
  . 
</collection>
  1. xml_parser.py generates inverted index for given xml file.
    On command prompt: python xml_parser.py stopwords.xml sample.xml sample-output.xml
    It generates sample-output.xml file as
    word|pageID:occurence1,occurence2...;pageID:occurence1, occurernce2...;... on each line

About


Languages

Language:Python 100.0%