stsewd / tree-sitter-comment

Tree-sitter grammar for comment tags like TODO, FIXME(user).

Home Page:https://stsewd.dev/tree-sitter-comment/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Incorrect parsing for URLs with parenthesis in them

alexaandru opened this issue · comments

Given a file that includes:

// https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition)_parameters

I would expect https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition)_parameters to be parsed as URL. However, when I inspect the parse tree, I see that it says:

(comment ; [28, 0] - [28, 91]
  (source ; [28, 0] - [28, 91]
    (uri))) ; [28, 3] - [28, 79]

it considers it as URL up to, but not including, the closing parenthesis: https://en.wikipedia.org/wiki/ANSI_escape_code#SGR_(Select_Graphic_Rendition.

Hi, this is mostly because ) is considered a stop words like (URL)wods.. or (URL). words... But I'll see if the rules can be relaxed.

As per the URL RFC spec:

Thus, only alphanumerics, the special characters "$-_.+!*'(),", and reserved characters used for their reserved purposes may be used unencoded within a URL.

that URL is perfectly legal. Thank you for looking into this :-)