tree-sitter / tree-sitter

An incremental parsing system for programming tools

Home Page:https://tree-sitter.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsing string literals in predicates isn't accounting for `\s`

DavisVaughan opened this issue · comments

Problem

Consider this example from the docs of finding doc comments in C using a match query

((comment)+ @comment.documentation
  (#match? @comment.documentation "^///\s+.*"))

This actually doesn't work!

It looks like the \s isn't accounted for as an allowed special case in ts_query__parse_string_literal() and instead the \\ gets swallowed and we just end up with an s to try and match against.

Here's an example of this in the playground where we should be getting a match, but aren't

Screen.Recording.2024-04-19.at.1.49.36.PM.mov

Here's some weird stuff that does match, because it's looking for a literal s

Screenshot 2024-04-19 at 1 54 47 PM

Looks like it is going through the default case here

tree-sitter/lib/src/query.c

Lines 2023 to 2038 in 4cd23ff

switch (stream->next) {
case 'n':
array_push(&self->string_buffer, '\n');
break;
case 'r':
array_push(&self->string_buffer, '\r');
break;
case 't':
array_push(&self->string_buffer, '\t');
break;
case '0':
array_push(&self->string_buffer, '\0');
break;
default:
array_extend(&self->string_buffer, stream->next_size, stream->input);
break;

Steps to reproduce

Expected behavior

Tree-sitter version (tree-sitter --version)

0.21.0

Operating system/version

macOS 13.6.5

Apologies if I'm mistaken, but it seems to work if you escape the slash before the s in your query.

Playground:

image

Query via CLI:

image

Ah I think you're right! Swapping to \\s does fix the playground too.

So I think that may just be a small typo on the docs page where it needs to be \\s rather than \s, and then it makes sense to me that \n, \t, \r, and \0 are the only other ones handled specially.

Yes, because we support escape sequences in those strings, you need to escape the backslash if you want to write a literal backslash (which you want in this case, so that the regex engine sees \s). Want to open a PR for the docs fix?

Sure I'll do one monday!