lgraubner / sitemap-generator

Easily create XML sitemaps for your website.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bad links to email@domain.com cause duplicate

opened this issue · comments

Do you want to request a feature or report a bug?

Bug

What is the current behavior?

When indexing a website that contains a link to an email address of the same domain, the site crawls as though its a new page. eg. indexing google.com where the following HTML appears:

<a href="contact@google.com">link</a>.

The site will then index pages at:

This is a valid url, but should be discounted as a duplicate.

If the current behavior is a bug, please provide the steps to reproduce.

As above

What is the expected behavior?

The section of the URL prior to the @ symbol should be discounted.

Pull request created here:
#58

Merged and released.