karussell / snacktory

Readability clone in Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not able to extract content

saketmalpure opened this issue · comments

Not able to extract content from the some websites like quora.com and possibly some others.
It is returning 403, for HEAD request method at this line in HtmlFetcher class.

Quora returns 403 for a HEAD request, if you call fetchAndExtract but set resolve to false it will work. I can add a pull request for this, adding an option where if the HEAD request fails it falls back on a GET request.