Custom/extendable `CrawlUrl`
rudiedirkx opened this issue · comments
I want to keep detailed track of crawled pages: number of references, response content-type & http code, etc. I can keep those in my own list of crawled URL objects, but that's A LOT of redundancy. Even in the current queue every URL is saved 3 times: array key, CrawlUrl->url
, CrawlUrl->id
. I don't want to add even more, but I do want to add a few stats per URL. With an custom/extendable CrawlUrl
I could add those efficiently.
I haven't actually tried to keep track of references yet. Is that possible? I want to know how many pages link to /contact.html
, or /help/bla.html
, or /files/bestpdf.pdf
etc.
Extendable? Extendible? Extensible? You know.