Custom/extendable `CrawlUrl`

Question

Custom/extendable `CrawlUrl`

rudiedirkx opened this issue 2 years ago · comments

I want to keep detailed track of crawled pages: number of references, response content-type & http code, etc. I can keep those in my own list of crawled URL objects, but that's A LOT of redundancy. Even in the current queue every URL is saved 3 times: array key, CrawlUrl->url, CrawlUrl->id. I don't want to add even more, but I do want to add a few stats per URL. With an custom/extendable CrawlUrl I could add those efficiently.

I haven't actually tried to keep track of references yet. Is that possible? I want to know how many pages link to /contact.html, or /help/bla.html, or /files/bestpdf.pdf etc.

Extendable? Extendible? Extensible? You know.