tree-sitter / tree-sitter

An incremental parsing system for programming tools

Home Page:https://tree-sitter.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`TreeCursor::goto_first_child_for_byte` cannot find nodes after ERROR nodes

VonTum opened this issue · comments

Problem

I encounter the problem when working on the front end for my own language compiler (https://github.com/pc2/tree-sitter-sus). I'm using the rust bindings for it.

It appears that the function TreeCursor::goto_first_child_for_byte cannot find nodes after ERROR nodes.

Steps to reproduce

Grab a copy of a grammar that looks at least like this:

rules: {
        source_file: $ => repeat(field('item', $.module)),
        
        module: $ => ....
        
        single_line_comment : $ => /\/\/[^\n]*/,
        multi_line_comment : $ => /\/\*[^\*]*\*+([^\/\*][^\*]*\*+)*\//,
  ....
},
extras: $ => [
        /\s+/,
        $.single_line_comment,
        $.multi_line_comment
]

Some code that illustrates how I'm attempting to use the function:

// Enter the byte range of a node coming after an ERROR node
let desired_span : Range<usize> = ...;
println!("DESIRED: {:?}", desired_span);
let mut cursor = tree.walk();
// First display the nodes and all their byte ranges
assert!(cursor.goto_first_child());
loop {
    let node = cursor.node();
    println!("{}: {:?}", node.kind(), node.byte_range());
    if !cursor.goto_next_sibling() {break;}
}
assert!(cursor.goto_parent());
// This unwrap fails, even though it should find the node we input at desired_span
let _ = cursor.goto_first_child_for_byte(desired_span.start).unwrap();

If I run this code I get the following: (outer code loops through all modules, so that 3rd module is the first one to error. Adding or removing errors to the program file moves the one which breaks, which is always the one right after the error.

DESIRED: 192..652
module: 1..56
module: 58..166
ERROR: 168..186
module: 192..652
module: 655..895
multi_line_comment: 898..936
ERROR: 938..975
module: 976..1071
multi_line_comment: 
....

In fact, if I replace cursor.goto_first_child_for_byte with this:

let desired_span : Range<usize> = ...;
let mut cursor = tree.walk();
assert!(cursor.goto_first_child());
loop {
    let node = cursor.node();
    if node.byte_range() == desired_span {break}
    assert!(cursor.goto_next_sibling());
}
//cursor.goto_parent();
//let _ = cursor.goto_first_child_for_byte(span.into_range().start).unwrap();

it works as I would expect cursor.goto_first_child_for_byte to work.

PS it appears extras can also break it the same way. I tried adding a comment before an earlier module, and that also breaks it:

DESIRED: 66..174
module: 1..56
single_line_comment: 58..65
module: 66..174
ERROR: 176..194
module: 200..660
module: 663..903
multi_line_comment: 906..944
ERROR: 946..983
module: 984..1079

Expected behavior

I would expect TreeCursor::goto_first_child_for_byte to find the first child for that byte, regardless of ERROR nodes, or extra nodes encountered along the way.

Tree-sitter version (tree-sitter --version)

tree-sitter 0.22.2

Cargo.toml:

[dependencies]
tree-sitter = "~0.22.2"

Operating system/version

TUXEDO OS 2 x86_64