jonhoo / fantoccini

A high-level API for programmatically interacting with web pages through WebDriver.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scraping?

xbc5 opened this issue · comments

commented

Is this a good fit for scraping?

I am looking to to scrape some fully rendered web pages -- so JS support needed. I have struggled to find something stable and well supported. I thought that remote control of a popular web browser is the best bet.

I need: text content (headers, titles, paragraphs), links to images; also possibly preserve anchors in a structured way (e.g. so I can render footnotes).

Is it possible with this lib in a generic way? (i.e. without foreknowledge of the page structure)

Thanks.

commented

Sorry. RTM. FS.