pinoceniccola / what-hn-says-webext

Web Extension: Easily find Hacker News discussions about the page you're currently browsing.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Thread not found if URL contains UTM-related query parameters

memoryonrepeat opened this issue · comments

Hello,

I notice that the extension sometimes can't find the thread if the link contains UTM-related parameters. These are usually generated when clicking to links from newsletters and can be safely removed to improve search quality. What do you think?

Example: https://www.deprocrastination.co/blog/how-to-be-productive-without-forcing-yourself?utm_source=hackernewsletter&utm_medium=email&utm_term=fav

Works after removing all the params: https://www.deprocrastination.co/blog/how-to-be-productive-without-forcing-yourself

If you're busy I can also help with a PR. Thanks for making this extension btw :)

I'm aware, it may be worth creating a little function that take care of URL canonicalization, including:

  • Removing common query parameters (utm_*, *clid)
  • urls ending with index.php, index.htm?l
  • Other special cases like #5 (www/non-www)

Oh yes, contributions are very welcome! Thanks.

Sure, created the MR, could you please take a look #7