Vikibeta / JandanPicture

A Scrapy Jandan.net Spider

Home Page:http://haipz.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Jandan Picture

This is a Jandan Spider.

Stay Simple, Stay Naive.

Warning

Just for studying. Please don't consume Jandan too much network traffic.

Feature

  • Request by selecting User Agetn from User Agent List randomly
  • Update HTTP Proxy IP by multiple process and check the status of IP automatically
  • Analyze the original picture url and download the popular picture into the ooxx directory
  • Save all items into data.dat

Requirement

  • Python 2.7
  • Scrapy
  • Multiprocessing
  • Proxy by mapleray

Run

  • Windows: Double click run.bat
  • Linux or OS X: Run command scrapy crawl JandanPicture

Author

Haipz @haipz.com

About

A Scrapy Jandan.net Spider

http://haipz.com


Languages

Language:Python 99.7%Language:Batchfile 0.3%