matthewmueller / x-ray

The next web scraper. See through the <html> noise.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Paginate and limit based on number of pages

Globerada opened this issue · comments

Hi.
I have not found this information in the docs.

How can I achieve a paginate based on the numbers of pages that the URL have?
Below is the example that I am using. Instead of a high limit so I can crawl all the pages, how can I put a valid limit base on the real number of pages?

x('http://www.example.com/products', 'div.products_details_container', data)
.paginate('.pagination a:last-Child@href')
.limit(999)
.write('results.json');

The current approach is not to use a limit, but create your selector such that it will stop once you run out... which yours probably should.