Cannot use unicode in queries on windows in rust in tree-sitter 0.22.2
ahelwer opened this issue · comments
Problem
On tree-sitter versions 0.21.0 and higher, attempting to compile a query containing a unicode character fails on windows using the rust bindings. The same code succeeds on linux and macOS, and succeeds across all platforms on 0.20.x versions:
use tree_sitter::{Parser, Query, QueryCursor};
fn main() {
let mut parser = Parser::new();
parser.set_language(&tree_sitter_test::language()).expect("Error loading grammar");
let source_code = "op == expr op ≜ expr";
let tree = parser.parse(source_code, None).unwrap();
println!("{}", tree.root_node().to_sexp());
let query = Query::new(&tree_sitter_test::language(), "(def_eq \"≜\" @def_eq)").unwrap();
let mut cursor = QueryCursor::new();
for capture in cursor.matches(&query, tree.root_node(), "".as_bytes()) {
println!("{:?}", capture);
}
}
Steps to reproduce
- Clone this branch containing a minimal tree-sitter grammar and rust program: https://github.com/ahelwer/tree-sitter-test/tree/windows-unicode
cd
into therust
directory and runcargo run
On Windows, this will produce the following error value:
called `Result::unwrap()` on an `Err` value: QueryError { row: 0, column: 9, offset: 9, message: "", kind: NodeType }
On Linux and macOS, it will succeed.
Alternatively, you can see a less minimal cross-platform reproduction of it in this CI run: https://github.com/tlaplus-community/tlauc/actions/runs/8440383121
Expected behavior
I expected to be able to continue using unicode characters in queries on all supported platforms. This behavior worked across all platforms when generating and consuming grammars with tree-sitter 0.20.x.
Tree-sitter version (tree-sitter --version)
0.22.2, 0.21.0 or higher
Operating system/version
N/A