tree-sitter / tree-sitter

An incremental parsing system for programming tools

Home Page:https://tree-sitter.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cannot use unicode in queries on windows in rust in tree-sitter 0.22.2

ahelwer opened this issue · comments

Problem

On tree-sitter versions 0.21.0 and higher, attempting to compile a query containing a unicode character fails on windows using the rust bindings. The same code succeeds on linux and macOS, and succeeds across all platforms on 0.20.x versions:

use tree_sitter::{Parser, Query, QueryCursor};

fn main() {
    let mut parser = Parser::new();
    parser.set_language(&tree_sitter_test::language()).expect("Error loading grammar");
    let source_code = "op == expr op ≜ expr";
    let tree = parser.parse(source_code, None).unwrap();
    println!("{}", tree.root_node().to_sexp());

    let query = Query::new(&tree_sitter_test::language(), "(def_eq \"\" @def_eq)").unwrap();
    let mut cursor = QueryCursor::new();
    for capture in cursor.matches(&query, tree.root_node(), "".as_bytes()) {
        println!("{:?}", capture);
    }
}

Steps to reproduce

  1. Clone this branch containing a minimal tree-sitter grammar and rust program: https://github.com/ahelwer/tree-sitter-test/tree/windows-unicode
  2. cd into the rust directory and run cargo run

On Windows, this will produce the following error value:

called `Result::unwrap()` on an `Err` value: QueryError { row: 0, column: 9, offset: 9, message: "", kind: NodeType }

On Linux and macOS, it will succeed.

Alternatively, you can see a less minimal cross-platform reproduction of it in this CI run: https://github.com/tlaplus-community/tlauc/actions/runs/8440383121

Expected behavior

I expected to be able to continue using unicode characters in queries on all supported platforms. This behavior worked across all platforms when generating and consuming grammars with tree-sitter 0.20.x.

Tree-sitter version (tree-sitter --version)

0.22.2, 0.21.0 or higher

Operating system/version

N/A