spatie / crawler

An easy to use, powerful crawler implemented in PHP. Can execute Javascript.

Home Page:https://freek.dev/308-building-a-crawler-in-php

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Links with query parameters (GET parameters) are only crawled when present in the URL passed to startCrawling()

Defcon0 opened this issue · comments

Hello,

I have a website with paginations on it, i.e. I have pages where you can click on 1, 2 ... to get the next results of the list on it. Therefor a GET paremeter page=x is used.

Given the following situation:

  1. /mypage -> contains a link to /mypage2
  2. /mypage2 -> contains the paginated list with the items 1-3
  3. /mypage2?page=2 -> contains the paginated list with the items 4-6

If a pass /mypage2 to the crawler it finds and crawls the pagination links as well. If I pass /mypage, it finds /mypage2 but not /mypage2?page=2

Am I doing something wrong or is it intentional?

Thanks in advance!

Bye

OK, I found the "issue". I've had the same problem as in #236. The timeout of 10s was a little short in my local dev setup ;-)