tree-sitter / tree-sitter-julia

Julia grammar for Tree-sitter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`string_literal`s should include opening and closing quotes as child nodes

dhanak opened this issue · comments

Initial problem: I don't want TAB to mess with the insides of my strings, especially my multiline docstrings. I don't want it to remove tabs/spaces from or insert them into the start of lines (or anywhere, for that matter) inside docstrings. I do, however, want to indent a line that has a string on it, assuming the line starts with the opening quote (e.g., passed as an argument to a function).

"""
Don't indent this line.
    Don't unindent this line either.
"""
print("foo",
      "bar", # ident this line, when I press TAB
      "baz\nbazinga$(x)") # also indent this line

Solution attempt: Add indentation rule to keep indentation as is when the parent-is a string_literal. In Emacs terms, add this to treesit-simple-indent-rules:

((parent-is "string_literal") no-indent 0)

Complication: string_literals can be leaf nodes, when they are just plain strings, but can also be branch nodes, when they contain interpolation or escape sequences. I argue that they should always be branch nodes, and always include the opening and closing quotes as leaf nodes, as a minimum. Without it, all their non-special contents effectively appear as whitespace to tree-sitter clients.

And this is a problem, because the tree-sitter indentation implementation in Emacs (which I assume is conceptually correct), treesit--indent-1, first identifies the nearest leaf node around or after the beginning of the line. Then it finds the matching indentation rule, and indents based on that. That is, it essentially ignores all whitespace, which is good. But here it also ignores the contents of the string as well as the opening quote, which is bad.

With the current node tree, there is no way to tell whether the start of the line is inside the string (that includes escape sequences or interpolations), or at the opening quote. At least, I couldn't figure out how that can be done.

When its just a plain string, without child nodes, then it works as expected, because the string_literal is the leaf node, and its start concides with the beginning of the line, so the rule doesn't apply.

I'm using Emacs v29.1, and a julia treesit grammar from Dec 18, 2023 (I don't know how its exact version can be determined).

I think I understand the issue, but it's not super-clear why having the quotes as child nodes solves it.

Anyways, we already have tokens for string start/end quotes, they're just not visible. Fixing this might just require making them visible and updating the tests.

When the open quote is a child node, I can add a rule to match it and indent it. And add other rules to keep the indentation of any other child node of a string_literal node. When the quote is not a child node, I simply cannot match for the beginning of the string.