zhaimobile / collector-http

Norconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.

Home Page:http://www.norconex.com/collectors/collector-http/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Norconex HTTP Collector

Norconex HTTP Collector Logo

Norconex HTTP Collector is a full-featured web crawler (or spider) that can manipulate and store collected data into a repositoriy of your choice (e.g. a search engine). It very flexible, powerful, easy to extend, and portable. Can be used command-line with file-based configuration on any OS, or can be embedded into Java applications using well documented APIs.

Visit the web site for binary downloads and documentation:

About

Norconex HTTP Collector is a flexible web crawler for collecting, parsing, and manipulating data from the Internet (or Intranet) to various data repositories such as search engines.

http://www.norconex.com/collectors/collector-http/


Languages

Language:Java 98.5%Language:HTML 1.4%Language:Shell 0.0%Language:Batchfile 0.0%