A Powerful Spider(Web Crawler) System in Python. [TRY IT NOW!][Demo]
- Write script in python with powerful API
- Python 2&3
- Powerful WebUI with script editor, task monitor, project manager and result viewer
- Javascript pages supported!
- MySQL, MongoDB, SQLite, PostgreSQL as database backend
- Task priority, retry, periodical, recrawl by age and more
- Distributed architecture
Documentation: http://docs.pyspider.org/
Tutorial: http://docs.pyspider.org/en/latest/tutorial/
In this fork, I integrated Solr into pyspider for indexing and searching crawl data.