Mr-Dai / Crawly

An easy-to-use Java crawler framework

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Crawly

Inspired by webmagic, Crawly is an open source web crawler framework for Java which provides a fine-grained component structure. Using it, you can easily setup a web crawler for your production server.

Currently the project is still under rapid development and may change on daily basis. Also, the source code is not tested adequately, tremendous amount of bugs can be found when you're using it.

Use it with cautions.

Contribute

The section Future Milestones lists all the features will be added in the future. If you have other amazing features in mind, please post it in the Issue page of this repository, I'll reply ASAP.

Future Milestones

  • Restart worker threads of ConcurrentCrawler when exception occurred.
  • Add test cases.
  • Add support for Proxy (HTTP and SOCKS).
  • Add support for Rate Limit.
  • Add support for SMTP, POP3 and IMAP.
  • Change the LICENSE.
  • Release 0.1.
  • Add more example crawlers and refactor the framework based on use experience.
  • Release 0.2.
  • Distributed crawler.

About

An easy-to-use Java crawler framework

License:GNU General Public License v3.0


Languages

Language:Java 100.0%