spatie / crawler

An easy to use, powerful crawler implemented in PHP. Can execute Javascript.

Home Page:https://freek.dev/308-building-a-crawler-in-php

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Custom/extendable `CrawlUrl`

rudiedirkx opened this issue · comments

I want to keep detailed track of crawled pages: number of references, response content-type & http code, etc. I can keep those in my own list of crawled URL objects, but that's A LOT of redundancy. Even in the current queue every URL is saved 3 times: array key, CrawlUrl->url, CrawlUrl->id. I don't want to add even more, but I do want to add a few stats per URL. With an custom/extendable CrawlUrl I could add those efficiently.

I haven't actually tried to keep track of references yet. Is that possible? I want to know how many pages link to /contact.html, or /help/bla.html, or /files/bestpdf.pdf etc.


Extendable? Extendible? Extensible? You know.