IonicaBizau / scrape-it

🔮 A Node.js scraper for humans.

Home Page:http://ionicabizau.net/blog/30-how-to-write-a-web-scraper-in-node-js

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Headers of type HTTP/2

Curbe8 opened this issue · comments

Hi!

I am scraping airbnb.com pages which use HTTP / 2 type connections, when trying to scrape airbnb returns the access denied page no matter where I try to scrape. I have been looking at the headers in the requests of your page and they use a particular header that is why it denies my request from nodejs, a header ': authority', this header is from the new integrations of HTTP/2.

The error code is: ERR_INVALID_HTTP_TOKEN

Is there a way for your plugin to accept this type of headers or to change a plugin in my project that adds this feature?

Currently I am using express.js which as you can see in its main page does not support the HTTP / 2 format

I am not an expert on the subject and I apologize for any mistakes I may have made in reporting this issue.

Any information you need I will be waiting for your answers.

Regards!

The default scrape-it request library is quite minimal. If you want to scrape pages with other encodings or protocols, you can use another library (e.g. axios) and pass the HTML to the scrapeHTML method.