Scraping?
xbc5 opened this issue · comments
Is this a good fit for scraping?
I am looking to to scrape some fully rendered web pages -- so JS support needed. I have struggled to find something stable and well supported. I thought that remote control of a popular web browser is the best bet.
I need: text content (headers, titles, paragraphs), links to images; also possibly preserve anchors in a structured way (e.g. so I can render footnotes).
Is it possible with this lib in a generic way? (i.e. without foreknowledge of the page structure)
Thanks.
Sorry. RTM. FS.