tree-sitter / tree-sitter

An incremental parsing system for programming tools

Home Page:https://tree-sitter.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parsing-After-Editing Test Case is incorrect (and passing)

rooney opened this issue · comments

Problem

The test test_parsing_after_editing_tree_that_depends_on_column_values is incorrect.

It starts with parsing the following source code:

a = b
c = do d
       e + f
       g
h + i

and asserting that the parse tree should be:

"(block ",
"(binary_expression (identifier) (identifier)) ",
"(binary_expression (identifier) (do_expression (block (identifier) (binary_expression (identifier) (identifier)) (identifier)))) ",
"(binary_expression (identifier) (identifier)))",

Then, it perform_edit to the source code, to become:

a = b
c1234 = do d
       e + f
       g
h + i

(so far so good)

The problem is, it then asserts that the parse tree of the edited source code should become:

 "(block ",
 "(binary_expression (identifier) (identifier)) ",
 "(binary_expression (identifier) (do_expression (block (identifier)))) ",
 "(binary_expression (identifier) (identifier)) ",
 "(identifier) ",
 "(binary_expression (identifier) (identifier)))",

Which doesn't seem to be correct because what the perform_edit did is just renaming the identifier c to c1234 -- it should not result in any change to the tree structure at all.

But yes, the test passes. Which means, there's some bug in the tree-editing implementation.

Steps to reproduce

Expected behavior

test_parsing_after_editing_tree_that_depends_on_column_values shall not pass.

Or, if it were to, then the edited parse tree should be equal to the original.

Tree-sitter version (tree-sitter --version)

tree-sitter 0.22.2 (b7fcf98)

Operating system/version

macOS 13.6.1

No, the test is correct as is.

That test is simulating a Haskell-like language, where indentation blocks are based on the column where the block begins (in this case, the word do). The exact grammar is in the test_grammars folder. You can see its indentation logic here.

if (valid_symbols[INDENT]) {
while (iswspace(lexer->lookahead)) {
lexer->advance(lexer, false);
}
uint32_t column = lexer->get_column(lexer);
if (column > self->indents[self->indent_count - 1]) {
self->indents[self->indent_count++] = column - 2;
lexer->result_symbol = INDENT;
return true;
} else {
return false;
}
}

Prior to the edit, the do is at column 4. So the following two lines, which begin at column 6, are considered indented (relative to the first line).

After the edit, do is now at column 8. So now, the second and third lines are not indented. This changes the entire syntactic structure of the block. That's the whole point of this test - the grammar depends on the column positions of tokens.