IonicaBizau / scrape-it

🔮 A Node.js scraper for humans.

Home Page:http://ionicabizau.net/blog/30-how-to-write-a-web-scraper-in-node-js

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Alternative selectors for one element

jpujol880807 opened this issue · comments

Hi, congratulations for the great work in this project. I have use case that makes a little difficult to use this package on my project.
Let's suppose that we are scraping a website that implements A-B testings. I mean, some times they change their html a little bit.
Then I would like to have alternative selectors for single keys, let's say I want to have something like this:

 ...
 content: {
    selectors: [ 
     {
        selector: ".article-content"
       , how: "html"
      },
      {
         selector:"#article-content",
         attr: "data-content"
      }
   ]
}
....

because sometimes the page I scrape may present the content inside the element with class article-content and other on the atrribute data-content of the element with id article-content. I would like to have both selectors and evaluate them in order in a way that if first selector fails I search on the second and so on. Is there a clean way of implementing these multiple selectors for a single item? If no, I think this could be a nice feature for the project.

This could be interesting to implement, however keeping the code simpler I would suggesting calling scrapeHTML on the response code twice: in the first call you detect what version of the page is being loaded and then you do the final scraping.