mattfdev / WebCrawler

Focused Java Based we crawler that parses the web for a search term and records to a database

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

WebCrawler

Focused Java Based web crawler that parses the web for a search term and records to a database. Utilizes Bing Search of a user's input as an initial seed vector, and builds a crawled term database ourward. Crawler respects domain robot.txt and attempts to limit bandwith costs to hosts by rate-limiting itself when required.

Implements rudimentary IR retrieval techniques and basic threading.

About

Focused Java Based we crawler that parses the web for a search term and records to a database


Languages

Language:Java 100.0%