tree-sitter / tree-sitter-c-sharp

C# Grammar for tree-sitter

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Parse error classes with primary constructors (C#12, Dotnet 8 LTS)

neuromagus opened this issue · comments

https://learn.microsoft.com/en-us/dotnet/csharp/whats-new/tutorials/primary-constructors

Tree-sitter did not understand record class with primary constructor syntax.
Perhaps this behavior extends to structures.

20240409_01h13m21s_grim

P.S. Please, don't ask me about theme, I try to config nvim ;} How to get out?

I understand for this

C# 12.0 (under development)

but all plugins, which use tree-sitter (Csharp I mean) work with UB. For example, this et cetera... (The Emacs universe - is another question).
I read the code a little, and... Guys!.. C#, complicated by syntactic sugar compared to Java, but parser.c in 42Mb and 2.5Mb? So Csharp syntax is 16 times heavier?

I understand, DSL, Dotnet updates every year. If u have any instructions for learn this DSL and/or normal realized examples - i'm in. I have a time to help. IMHO, maybe it’s worth creating a coherent structure and describing it in the documentation? The language is in good shape, developing well, transition to FP is underway, syntax and new constructions will be added constantly.

P.S. Sorry for my English.

#273 (comment) references a commit that increased the parser size significantly.

well, I ask AI(to me a completely new subject), and

this is an AI response to your post


The conflict arises because the parser generator is unable to determine whether to interpret the sequence '*' _lvalue_expression '=' as a pointer indirection expression followed by an assignment, or as the start of an assignment expression with a dereferenced lvalue.

Here's the breakdown of the conflict:

  1. The sequence starts with '*', which could be the start of a pointer indirection expression (_pointer_indirection_expression).
  2. After '*', there is an _lvalue_expression, which is a valid continuation of both a pointer indirection expression and an assignment expression.
  3. The next token is '=', which creates the ambiguity:
    • It could be interpreted as the start of an assignment operator within an assignment expression (assignment_expression).
    • Or, it could be treated as a separate assignment operator following a complete pointer indirection expression.

The parser generator is not able to automatically resolve this ambiguity based on the precedence rules alone. It needs additional information to decide how to interpret the sequence.

Regarding your question about why '*' _lvalue_expression isn't reduced to _pointer_indirection_expression due to its higher precedence, it's because the parser generator doesn't have enough lookahead information at that point to make the decision. It needs to consider the token following the _lvalue_expression to determine whether it should reduce to _pointer_indirection_expression or continue parsing an assignment expression.

Blah-blah...
Blah-blah...

In summary, the conflict arises due to the ambiguity in interpreting the sequence '*' _lvalue_expression '=', and the parser generator needs additional information or explicit conflict resolution to determine the correct parse.


Well, I wonder how many other places like this are there in grammar.js? :(