amerkurev / scrapper

Web scraper with a simple REST API living in Docker and using a Headless browser and Readability.js for parsing.

Home Page:https://scrapper.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dynamic DOM Support?

dr0id123 opened this issue · comments

I've noticed this application is not processing (output) dynamic DOM, and therefore is not compatible with such sites. For example:

https://angular.io/about?group=Angular

Raw HTML -> Cannot find the word "puppies"
Fully generated DOM (DOM created in the browser, e.g., Chome Dev Tools), -> You can search and find the word "puppies".

Headless browsers should be able to output dynamic DOM as html (e.g., selenium does this).

Am I missing something? It should be possible given real browsers are being used.

Nevermind, got this sorted -- need to use the correct wait function for all the javascript to process.