wxggg / SimpleIndexer

a simple python xml files indexer

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Introduction

This is a simple indexer, and can be used to index the xml format documents.

Structure

  • index
  • compress
    • inverted index
    • dictionary
  • search

Tools

  • BeautifulSoup

Run

bash run.sh

对于大量数据搜索的话,就需要用到Lucene这样的开源搜索引擎了,附上之前写的针对trec-cds2015检索竞赛数据的例子,效果不太好,但功能基本完整

About

a simple python xml files indexer


Languages

Language:Roff 92.9%Language:Python 7.0%Language:Shell 0.1%