scalingexcellence / scrapybook

Scrapy Book Code

Home Page:http://scrapybook.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

problem in example page 46 (populating an item)

MasRa opened this issue · comments

commented

Hi
Could you please help with this:
I did follow step by step the example on page 46 exactly, but I got the following report as output and not as same the book's example:

_root@dev:~/book/MasoudProject/properties# scrapy crawl basic
2018-02-04 14:40:25 [scrapy] INFO: Scrapy 1.0.3 started (bot: properties)
2018-02-04 14:40:25 [scrapy] INFO: Optional features available: ssl, http11, boto
2018-02-04 14:40:25 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'properties.spiders', 'SPIDER_MODULES': ['properties.spiders'], 'BOT_NAME': 'properties'}
2018-02-04 14:40:25 [scrapy] INFO: Enabled extensions: CloseSpider, TelnetConsole, LogStats, CoreStats, SpiderState
2018-02-04 14:40:25 [boto] DEBUG: Retrieving credentials from metadata server.
2018-02-04 14:40:25 [boto] ERROR: Caught exception reading instance data
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/boto/utils.py", line 210, in retry_url
r = opener.open(req, timeout=timeout)
File "/usr/lib/python2.7/urllib2.py", line 404, in open
response = self._open(req, data)
File "/usr/lib/python2.7/urllib2.py", line 422, in _open
'_open', req)
File "/usr/lib/python2.7/urllib2.py", line 382, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
File "/usr/lib/python2.7/urllib2.py", line 1184, in do_open
raise URLError(err)
URLError: <urlopen error [Errno 101] Network is unreachable>
2018-02-04 14:40:25 [boto] ERROR: Unable to read instance data, giving up
2018-02-04 14:40:25 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2018-02-04 14:40:25 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2018-02-04 14:40:25 [scrapy] INFO: Enabled item pipelines:
2018-02-04 14:40:25 [scrapy] INFO: Spider opened
2018-02-04 14:40:25 [scrapy] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-02-04 14:40:25 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2018-02-04 14:40:25 [scrapy] DEBUG: Crawled (200) <GET http://web:9312/properties/property_000000.html> (referer: None)
2018-02-04 14:40:25 [scrapy] ERROR: Spider error processing <GET http://web:9312/properties/property_000000.html> (referer: None)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 588, in runCallbacks
current.result = callback(current.result, args, **kw)
File "/root/book/MasoudProject/properties/properties/spiders/basic.py", line 38, in parse
item['address'] = response.xpath('//
[@itemtype="http://schema.org/''Place"][1]/text()').extract()
File "/usr/local/lib/python2.7/dist-packages/scrapy/item.py", line 63, in setitem
(self.class.name, key))
KeyError: 'PropertiesItem does not support field: address'
2018-02-04 14:40:25 [scrapy] INFO: Closing spider (finished)
2018-02-04 14:40:25 [scrapy] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 232,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 792,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2018, 2, 4, 14, 40, 25, 736406),
'log_count/DEBUG': 3,
'log_count/ERROR': 3,
'log_count/INFO': 7,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'spider_exceptions/KeyError': 1,
'start_time': datetime.datetime(2018, 2, 4, 14, 40, 25, 241964)}
2018-02-04 14:40:25 [scrapy] INFO: Spider closed (finished)

Could you please guide me how make it true?
Thank you

So this was while playing with your own copy that has different settings.py than the ones in the chapter. This was the boto problem with that version of scrapy. Nothing important - just a warning essentially. The rest of crawling should be fine. One way to mitigate it is to add the following two lines in settings.py:

# Disable S3
AWS_ACCESS_KEY_ID = ""
AWS_SECRET_ACCESS_KEY = ""
commented

Thank you so much.