gabrielcc2 / IR14Assignment2

Desktop aplication that can crawl the Web for code snippets, beggining with a set of seed URLs and complying mostly with the robots.txt standards. It uses Lucene to index the webpages and support queries over the crawled data. Different indexes can be created, loaded + stored, allowing for specialized searches. This was developed as a mini-project from an Information Retrieval course, WiSe2014-2015@OvGU.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IR14Assignment2

Repository for the masters assignment in the IR course, WiSe2014-2015@OvGU.

The Main class can be found in ir.control>CodeSearch.

The program itself is self-contained and should be easy to use.

Additional documentation regarding the source code, including class diagrams, can be found in the doc folder.

About

Desktop aplication that can crawl the Web for code snippets, beggining with a set of seed URLs and complying mostly with the robots.txt standards. It uses Lucene to index the webpages and support queries over the crawled data. Different indexes can be created, loaded + stored, allowing for specialized searches. This was developed as a mini-project from an Information Retrieval course, WiSe2014-2015@OvGU.

License:Apache License 2.0


Languages

Language:Java 87.7%Language:CSS 7.5%Language:JavaScript 4.7%