Feature request, Duplicate page detector
mlibre opened this issue · comments
Do you want to request a feature or report a bug?
feature.
Adding duplicate link/page detection could be a good idea.
Google detects duplicate pages even if the page would not have the canonical references.
You are already crawling the page. you can make a hash of the content of each page, store it in an array.
whenever you want to add a link to sitemap.xml, first you need to make sure if there is no same hash in the array
The crawler should not add duplicate pages.