Issue on chater 3

Question

Issue on chater 3

OscarDgrouch opened this issue 7 years ago · comments

This is related to chapter 3, the book instructs me to run on Addess Item xpath => //[@itemtype="http://schema.org/Place"][1]/text().
However I'm getting this:
In [27]: response.xpath('//[@itemtype="http://schema.org/Place"][1]/text()').extract()
Out[27]:
[u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ',
u'\n ']

When I run it with out the text () I get this:
[u'\n West Hampstead, London',
u'\n Angel, London',
u'\n Tower Bridge, London',
u'\n Canary Wharf, London',
u'\n Whitechapel, London',
u'\n Chelsea, London',
u'\n Hackney, London',
u'\n Stratford, London',
u'\n Canary Wharf, London',
u'\n Chiswick, London',
u'\n Highbury, London',
u'\n Notting Hill, London',
u'\n Brixton, London',
u'\n Greenwich, London',
u'\n Canary Wharf, London',
u'\n Battersea, London',
u'\n South Kensington, London',
u'\n Camden, London',
u'\n Wimbledon, London',
u'\n West Hampstead, London',
u'\n West Hampstead, London',
u'\n Elephant And Castle, London',
u'\n Angel, London',
u'\n Heathrow, London',
u'\n Bayswater, London',
u'\n Seven Sisters, London',
u'\n Angel, London',
u'\n Angel, London',
u'\n Battersea, London',
u'\n Bethnal Green, London']
I tried paying with it and I came up with this:
In [32]: response.xpath('//*[@itemtype="http://schema.org/Place"][1]/span/text()').extract()
Out[32]:
[u'West Hampstead, London',
u'Angel, London',
u'Tower Bridge, London',
u'Canary Wharf, London',
u'Whitechapel, London',
u'Chelsea, London',
u'Hackney, London',
u'Stratford, London',
u'Canary Wharf, London',
u'Chiswick, London',
u'Highbury, London',
u'Notting Hill, London',
u'Brixton, London',
u'Greenwich, London',
u'Canary Wharf, London',
u'Battersea, London',
u'South Kensington, London',
u'Camden, London',
u'Wimbledon, London',
u'West Hampstead, London',
u'West Hampstead, London',
u'Elephant And Castle, London',
u'Angel, London',
u'Heathrow, London',
u'Bayswater, London',
u'Seven Sisters, London',
u'Angel, London',
u'Angel, London',
u'Battersea, London',
u'Bethnal Green, London']

**My questions which xpath expresion is right????? And why I'm getting an array instead of single values???

Dimitrios Kouzis-Loukas · Answer 1 · Tue Jan 24 2017 08:37:55 GMT+0800 (China Standard Time)

Hello, I see what you mean. I can confirm that:

scrapy shell http://web:9312/properties/index_00000.html
>>> response.xpath('//*[@itemtype="http://schema.org/Place"][1]/text()').extract()
[u'\n  ', ... u'\n  ', u'\n  ']
>>> response.xpath('//*[@itemtype="http://schema.org/Place"][1]/span/text()').extract()
[u'West Hampstead, London', ... , u'Bethnal Green, London']

The only issue is that in the context of Chapter you want to be crawling individual pages e.g.

scrapy shell http://web:9312/properties/property_000000.html
>>> response.xpath('//*[@itemtype="http://schema.org/Place"][1]/text()').extract()
[u'West Hampstead, London']

In Chapter 5, page 99 you can find how to crawl the index pages directly with relative XPaths (see also here).

P.S. Sorry for the typo - they are mentioned as "Relevant XPath" in that page.