niksite / url-normalize

URL normalization for Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Twitter hashtag search breaks on normalization

samuelclay opened this issue · comments

Here's a sample Twitter search with a hashtag: https://twitter.com/search?q=%23cncmachining&src=typed_query

When I run it through url_normalization, the encoded hash character (%23) is decoded into a hash (#), but it should stay encoded, because when I visit the normalized url, it 404s.

>>> from url_normalize import url_normalize
>>> url_normalize("https://twitter.com/search?q=%23cncmachining&src=typed_query")
'https://twitter.com/search?q=#cncmachining&src=typed_query'

When you visit them in the browser: