postrank-labs / postrank-uri

URI normalization, c14n, escaping, and extraction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

clean url with nested scheme results in invalid url

Jetinho opened this issue · comments

Hello,
I noticed an error in the clean url method, for an input url which includes a second 'reference' url scheme, like the following url :

INPUT
https://thumbor.forbes.com/thumbor/600x315/smart/https://specials-images.forbesimg.com/dam/imageserve/1038149341/960x0.jpg?fit=scale

In that case https:// gets transformed into https:/, and and the cleaned url is not valid.

OUTPUT
https://thumbor.forbes.com/thumbor/600x315/smart/https:/specials-images.forbesimg.com/dam/imageserve/1038149341/960x0.jpg?fit=scale

url = 'https://thumbor.forbes.com/thumbor/600x315/smart/https://specials-images.forbesimg.com/dam/imageserve/1038149341/960x0.jpg?fit=scale'

PostRank::URI.clean(url)
#=> "https://thumbor.forbes.com/thumbor/600x315/smart/https:/specials-images.forbesimg.com/dam/imageserve/1038149341/960x0.jpg?fit=scale"

I am going to add a workaround to my project for that case, since a correction would need to dig deeper in the method implemention, but I thought it could be interesting to report it her :).

Actually, the problem comes from normalize : in postrank-uri.rb:155

u.path = u.path.squeeze('/')

Heh, that's a fun edge case. What's your workaround? If you're for contributing a PR I'd be happy to review and help land it :)