sparklemotion / mechanize

Mechanize is a ruby library that makes automated web interaction easy.

Home Page:https://www.rubydoc.info/gems/mechanize/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scraping ajax enabled webpages

SorataAragaki opened this issue · comments

When I scrap some pages that have lots of ajax and js script using mechanize,some information lost compared with original pages. Mechanize doesn't have a js implement , but gem watir-webdriver is really really slow . Are there some great solutions?

commented

You could try phantomjs with something like selenium-webdriver, at least this would provide you with a headless option. The alternative is to figure out what the underlying js request is and convert this to something mechanise can use.

commented

It's not too difficult to figure out what the XMLHttpRequests are. If you use a proxy server like Charles you can inspect the all the calls the page makes and then usually mimic them with Mechanize.

This doesn't however give you the excellent (easy to read) output that Mechanize produces and you can't interact with the resultant DOM. I'd love to see the DSL of Mechanize with the output it produces built on top of something like PhantomJS so you could execute JS but suspect that this would be a huge (and unlikely) change.