IonicaBizau / scrape-it

🔮 A Node.js scraper for humans.

Home Page:http://ionicabizau.net/blog/30-how-to-write-a-web-scraper-in-node-js

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsing tables and xpath

senzacionale opened this issue · comments

Very good library but I have one question.

How to read "Cert" and "Primary Subject" from the table

HTML code:

<table class="table table-fixed table-header-right text-medium" xpath="1">
   <tbody>
      <tr>
         <th class="no-border">Cert</th>
         <td class="no-border">AE07591</td>
      </tr>
      <tr>
         <th>Item</th>
         <td>Mini-Helmet</td>
      </tr>
      <tr>
         <th>Primary Subject</th>
         <td>
            JOHNNY UNITAS<br>
         </td>
      </tr>
      <tr>
         <th>Result</th>
         <td>Authentic</td>
      </tr>
   </tbody>
</table>

I try it like this:

scrapeIt({
        url: 'https://www.cart.com/cart/ae07591'
        , headers: {'User-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:32.0) Gecko/20100101 Firefox/32.0'}
    }, {
        title: "#mainContent > div > h1"
        , desc: "#mainContent > div > p"
        , cert: "table tbody:nth-child(1) tr:nth-child(1) > td:nth-child(2)"
        , avatar: {
            selector: "body.allowhover:nth-child(2) div.container-fluid.padding-all.margin-bottom div.row:nth-child(3) div.col-thin-12.col-xs-8.col-xs-offset-2.col-sm-6.col-sm-offset-3.col-md-5.col-md-offset-0.padding-all.text-center div.carousel-paddles.margin-left.margin-right.slick-initialized.slick-slider div.slick-list.draggable div.slick-track div.slick-slide.slick-current.slick-active a:nth-child(1) > img.img-responsive"
            , attr: "src"
        }
    }).then(({data, response}) => {
        console.log(`Status Code: ${response.statusCode}`)
        console.log(`Response: ${response.}`)
        console.log(data)
    })

But is always empty. If I tested it with ChroPath working fine. Or I am not allowed to use xpath. Is there any other way how to do it?

Thank you