lgraubner / sitemap-generator

Easily create XML sitemaps for your website.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature request, Duplicate page detector

mlibre opened this issue · comments

Do you want to request a feature or report a bug?
feature.
Adding duplicate link/page detection could be a good idea.
Google detects duplicate pages even if the page would not have the canonical references.
You are already crawling the page. you can make a hash of the content of each page, store it in an array.
whenever you want to add a link to sitemap.xml, first you need to make sure if there is no same hash in the array

The crawler should not add duplicate pages.