IonicaBizau / scrape-it

🔮 A Node.js scraper for humans.

Home Page:http://ionicabizau.net/blog/30-how-to-write-a-web-scraper-in-node-js

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scrape a table with image links.

vampywiz17 opened this issue · comments

commented

Hello there,

I would like to scrape this:

		<table cellspacing=1 cellpadding=0 class='stbl' width='100%'>
		<tr><th colspan=3>Somogy megye</th></tr>
		<tr>
			<td class='row1' width=31><img src='/images/warningb/ts1.gif' border=0></td>
			<td class='row1' width=27><img src='/images/warningb/w1.gif' border=0></td>
			<td class='row1' width=170>Zivatar</td>
		</tr>
	</table>
	<div class='kt-friss'><div>Kiadva: 2021-05-25 15:06 (13:06 UTC)</div><div>[wbhx]</div></div>

I use this

{
    "data": {
        "listItem": "table tr td"
    }
}

and get this:

{"data":[{"___raw":""},{"___raw":""},"Zivatar"]}

my two question is:

  1. Somehow possible to show the img source link in output? (I need to use it)

  2. It is a generated table and sometime contain more that one line (tr), What will happen, in this case? My goal that different line generate a different object. it possible to solve it? The final goal a similar output:

{
    "Zivatar": [
        {
            "link1": "/images/warningb/ts1.gif"
        },
        {
            "link2": "/images/warningb/w1.gif"
        }
    ],
    "Felhőszakadás": [
        {
            "link1": "/images/warningb/ts1.gif"
        },
        {
            "link2": "/images/warningb/w1.gif"
        }
    ]
}

I'm a real beginner in this theme, so i welcome any help ! :)

@vampywiz17
Passerby here. Can you get the img tags using convert?

convert: (x) => {
              console.log(x)
              return x
            }

Also Have you tried out using how?

how?: string | ((element: CheerioSelector) => any);