Incorrect parsing for URLs with parenthesis in them
alexaandru opened this issue · comments
Given a file that includes:
// https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition)_parameters
I would expect https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition)_parameters
to be parsed as URL. However, when I inspect the parse tree, I see that it says:
(comment ; [28, 0] - [28, 91]
(source ; [28, 0] - [28, 91]
(uri))) ; [28, 3] - [28, 79]
it considers it as URL up to, but not including, the closing parenthesis: https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition
.
Hi, this is mostly because )
is considered a stop words like (URL)wods..
or (URL). words...
But I'll see if the rules can be relaxed.
As per the URL RFC spec:
Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.
that URL is perfectly legal. Thank you for looking into this :-)