scalingexcellence / scrapybook

Scrapy Book Code

Home Page:http://scrapybook.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The index in the xpath doesn't work

leon0707 opened this issue · comments

In page 37

scrapy shell https://www.gumtree.com/p/commercial-property-to-rent/south-kensington-to-let-serviced-office-space-in-sloane-avenue-sw3-south-kensington/1258815123

>>> response.xpath('//*[@][1]').extract()
[u'<meta  content="1190.00pm">', u'<meta  content="1750.00pw">', u'<meta  content="346.00pw">', u'<meta  content="50.00pw">', u'<meta  content="625.00pm">', u'<meta  content="250.00pm">', u'<meta  content="300.00pm">', u'<meta  content="400.00pm">', u'<meta  content="500.00pm">', u'<meta  content="190.00pm">', u'<meta  content="502.00pm">']

[1] in the xpath doesn't work, since it returns all <meta ...>. The first one is the price of the property, rest are the prices of similar property.

Copied from chrome: /html/body/div[2]/div/div[3]/main/div[2]/header/span/meta[2]. If I try this xpath, it return empty list.

@lookfwd Appreciate the effort you put in this book.

I think the xpath to find the price on a Gumtree is incorrect.
The correct one should be response.xpath('(//*[@])[1]').extract()

//*[@][1] would return all elements whose itemprop is "price" and which are the first child of their parents.

Explanation: https://stackoverflow.com/questions/3674569/how-to-select-specified-node-within-xpath-node-sets-by-index-with-selenium

Thanks @leon0707 . Both are correct. I will update them in the next version of the book. Thanks a million!