执行了 runSpider.py 过一段时间就不动了..
gccdChen opened this issue · comments
chen commented
2017-02-14 11:29:18 [10], msg:sql helper execute command:CREATE TABLE IF NOT EXI
STS free_ipproxy (id
INT(8) NOT NULL AUTO_INCREMENT,ip
CHAR(25) NOT NULL UNI
QUE,port
INT(4) NOT NULL,country
TEXT DEFAULT NULL,anonymity
INT(2) DEFAUL
T NULL,https
CHAR(4) DEFAULT NULL ,speed
FLOAT DEFAULT NULL,source
CHAR(20
) DEFAULT NULL,save_time
TIMESTAMP NOT NULL,PRIMARY KEY(id)) ENGINE=InnoDB
2017-02-14 11:29:19 [10], msg:*********run spider waiting...
awolfly9 commented
你好,你可以先检查下 runspider.py 中需要执行抓取的爬虫。
items = scrapydo.run_spider(XiCiDaiLiSpider)
items = scrapydo.run_spider(SixSixIpSpider)
items = scrapydo.run_spider(IpOneEightOneSpider)
items = scrapydo.run_spider(KuaiDaiLiSpider)
items = scrapydo.run_spider(GatherproxySpider)
如果有的话,可以查看日志 log/proxy.log 看下输出。
最终显示 **********run spider waiting...* 不动的原因是在等待下次抓取,调用了 time.sleep()
如果有问题欢迎回复。
…---------------------------
祝愉快
2017-02-14 11:31 GMT+08:00 chen <notifications@github.com>:
2017-02-14 11:29:18 [10], msg:sql helper execute command:CREATE TABLE IF
NOT EXI
STS free_ipproxy (id INT(8) NOT NULL AUTO_INCREMENT,ip CHAR(25) NOT NULL
UNI
QUE,port INT(4) NOT NULL,country TEXT DEFAULT NULL,anonymity INT(2) DEFAUL
T NULL,https CHAR(4) DEFAULT NULL ,speed FLOAT DEFAULT NULL,source CHAR(20
) DEFAULT NULL,save_time TIMESTAMP NOT NULL,PRIMARY KEY(id)) ENGINE=InnoDB
2017-02-14 11:29:19 [10], msg:**********run spider waiting...*
------------------------------
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#1>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ALPxzTwLkRix_HL5SMK2-NyVgGAJWK7Jks5rcSAMgaJpZM4MABJ3>
.
chen commented
奥..5分钟更新一次..
不过5个站点好少ip , 才155个.通过 douban 验证的才2个..
chen commented
谢谢~
awolfly9 commented
目前只抓取了几个站点,后许会增加。通过验证的 ip 数量会随着时间的增加而增加。有用的 ip 会不断的保留。