lgraubner / sitemap-generator

Easily create XML sitemaps for your website.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Appending extra characters to url.

RayBB opened this issue · comments

Do you want to request a feature or report a bug?
Bug

What is the current behavior?
When I run the program (currently using the CLI with verbose output) it ends up trying to get a bunch links that things added to the end that aren't in the source code anywhere. Here are some examples:

[ ERR ] https://www.eckerd.edu/my/%29;l.clearRect%280,0,k.width,k.height  (404)
[ ERR ] https://www.eckerd.edu/my/wp-content/uploads/sites/62/2016/05/gradient-ocean2-1.png%29;  (404)
[ ERR ] https://www.eckerd.edu/wp-content/uploads/2016/01/beakers-header.jpg%29;  (404)

If the current behavior is a bug, please provide the steps to reproduce.
if you run npx sitemap-generator-cli -v https://eckerd.edu within the first couple dozen sites you'll start seeing results like this.
My guess is that it's happening when the buffer is converted tostring but I haven't been able to test/verify.

What is the expected behavior?
I'd expect it to not be trying to get hundreds of urls that aren't anywhere in the source code.

Obviously the crawler has problems parsing the inline JavaScript and Styles. I will have a look it.

I found the problem. The crawler didn't apply the correct discoverResource function and therefore used the (crappy) default one. Fixed it and should be solved with v8.0.1 (CLI is updated accordingly).