IonicaBizau / scrape-it

🔮 A Node.js scraper for humans.

Home Page:http://ionicabizau.net/blog/30-how-to-write-a-web-scraper-in-node-js

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HTTP errors are not treated as errors

raysarebest opened this issue · comments

If a URL gives an error response, such as a 404 or a 502, the Promise returned by the scrapeIt function does not reject, but instead resolves and calls its .then chain, and passes a basically-empty object as the data parameter. For example, this code prints "success", even though the URL 404s:

const scrapeIt = require("scrape-it");

scrapeIt("http://google.com/404.html", {}).then(({data, response}) => {
    console.log("success");
}).catch(() => {
    console.log("error");
});

When hitting the URL gives a HTTP status code of something not in the 200s, I feel it should automatically reject the promise so client catches will run, instead of the then chain.

Yes, this approach has downsides—I remember I chose it for simplicity, I can see how it can break things. However, I guess people should be able to scrape error pages too (maybe they really want to do that).

We can add an option to use the behaviour you expect by default. 🚀
Contributions are welcome!

Is anybody else working on this? I would like to try my hand at this issue

@cukejianya Doesn't seem like anyone is so go for it!

In 6.x.x HTTP errors will eventually throw, as long axios does that.