ipsolar / proxy_ip_crawler

A simple crawler,Crawl and check the proxy IP.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

proxy_ip_crawler

##简述 抓取代理IP的爬虫
支持存储方式:MysqlSqliteJson
运行于_Python2.7_

###A.使用__scrapy__抓取
依赖模块:scrapyrequestslxmlpybloom。可选模块:mysql
运行配置:setting.py
使用_Mysql_存储内容需先运行SQL文件 proxy_ip.sql,并配置setting.py文件中的连接参数:MYSQL_CONNECT

  python launchScrapy.py

###B.使用__requests__抓取
 增加的爬取的网址,减少了必要依赖
 依赖模块:requestslxmlredispybloom ,可选模块:mysql
 运行配置:simple_crawler_config.py

  python launchSimpleCrawler.py

About

A simple crawler,Crawl and check the proxy IP.


Languages

Language:Python 100.0%