kezhenxu94 / house-renting

Possibly the best practice of Scrapy 🕷 and renting a house 🏡

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

运行很长时间,一直抓不到数据

nmweizi opened this issue · comments

  • 运行很长时间,一直抓不到数据
  • scrapy crawl 58
2019-01-19 11:50:43 [scrapy.utils.log] INFO: Scrapy 1.5.1 started (bot: house_renting)

2019-01-19 11:50:43 [scrapy.utils.log] INFO: Versions: lxml 4.2.1.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.3.1, w3lib 1.19.0, Twisted 17.9.0, Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 17.5.0 (OpenSSL 1.1.0h  27 Mar 2018), cryptography 2.2.2, Platform Windows-10-10.0.17134-SP0
2019-01-19 11:50:43 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_DEBUG': True, 'AUTOTHROTTLE_ENABLED': True, 'AUTOTHROTTLE_MAX_DELAY': 10, 'AUTOTHROTTLE_START_DELAY': 10, 'AUTOTHROTTLE_TARGET_CONCURRENCY': 2.0, 'BOT_NAME': 'house_renting', 'COMMANDS_MODULE': 'house_renting.commands', 'CONCURRENT_REQUESTS_PER_DOMAIN': 1, 'COOKIES_ENABLED': False, 'DOWNLOAD_DELAY': 10, 'DOWNLOAD_TIMEOUT': 30, 'LOG_LEVEL': 'INFO', 'NEWSPIDER_MODULE': 'house_renting.spiders', 'RETRY_TIMES': 3, 'SPIDER_MODULES': ['house_renting.spiders'], 'TELNETCONSOLE_ENABLED': False, 'USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_4) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1 Safari/605.1.15 '}
2019-01-19 11:50:44 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.throttle.AutoThrottle']
2019-01-19 11:50:44 [scrapy.middleware] INFO: Enabled downloader middlewares:
['house_renting.middlewares.HouseRentingAgentMiddleware',
 'house_renting.middlewares.HouseRentingProxyMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'house_renting.middlewares.HouseRentingRetryMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2019-01-19 11:50:44 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2019-01-19 11:50:44 [scrapy.middleware] INFO: Enabled item pipelines:
['house_renting.pipelines.HouseRentingPipeline',
 'house_renting.pipelines.DuplicatesPipeline',
 'scrapy.pipelines.images.ImagesPipeline',
 'house_renting.pipelines.ESPipeline']
2019-01-19 11:50:44 [scrapy.core.engine] INFO: Spider opened
2019-01-19 11:50:44 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2019-01-19 11:50:45 [scrapy.extensions.throttle] INFO: slot: hu.58.com | conc: 1 | delay:10000 ms (+0) | latency:  337 ms | size: 40350 bytes
2019-01-19 11:50:57 [scrapy.extensions.throttle] INFO: slot: hu.58.com | conc: 1 | delay:10000 ms (+0) | latency:   19 ms | size:   258 bytes
2019-01-19 11:51:09 [scrapy.extensions.throttle] INFO: slot: hu.58.com | conc: 1 | delay:10000 ms (+0) | latency:   38 ms | size:   258 bytes
2019-01-19 11:51:23 [scrapy.extensions.throttle] INFO: slot: hu.58.com | conc: 1 | delay:10000 ms (+0) | latency:   31 ms | size:     0 bytes
2019-01-19 11:51:35 [scrapy.extensions.throttle] INFO: slot: hu.58.com | conc: 1 | delay:10000 ms (+0) | latency:   15 ms | size:   258 bytes
2019-01-19 11:51:44 [scrapy.extensions.logstats] INFO: Crawled 1 pages (at 1 pages/min), scraped 0 items (at 0 items/min)
2019-01-19 13:00:25 [scrapy.extensions.throttle] INFO: slot: hu.58.com | conc: 1 | delay:10000 ms (+0) | latency:   34 ms | size:     0 bytes
2019-01-19 13:00:36 [scrapy.extensions.throttle] INFO: slot: hu.58.com | conc: 1 | delay:10000 ms (+0) | latency:  114 ms | size: 14012 bytes
2019-01-19 13:00:44 [scrapy.extensions.logstats] INFO: Crawled 277 pages (at 4 pages/min), scraped 0 items (at 0 items/min)
2019-01-19 13:00:49 [scrapy.extensions.throttle] INFO: slot: hu.58.com | conc: 1 | delay:10000 ms (+0) | latency:   93 ms | size: 14976 bytes
2019-01-19 13:01:03 [scrapy.extensions.throttle] INFO: slot: hu.58.com | conc: 1 | delay:10000 ms (+0) | latency:   31 ms | size:     0 bytes

估计58反扒机制升级了

配代理可以了