LanguageDev / Yoakke

A collection of libraries for implementing compilers in .NET.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make the parser be able to yield a parse-tree

LPeter1997 opened this issue · comments

Is your feature request related to a problem? Please describe.
In some cases it would be nice to retrieve the raw parse-tree, because the goal is formatting the input, or preserving every detail is crucial. This is the case in my previous compiler.

Describe the solution you'd like
The parser should have an extra feature, to tell it not to use the transformation methods, but use the parameters to construct the parse-tree instead. This would even allow for generating the type/methods for translating the parse-tree, as the parse-tree -> AST conversion would simply be feeding in the members of the parse-tree into those same transformation functions.

Describe alternatives you've considered
We could use the signature of the function, to deduce how the user wishes to store elements (important, when having things like Punctuated).

We could have some very simple, generic type for the parse-tree:

public interface IParseTreeNode
{
    // The name of the rule that produced this node
    public string Name { get; }

    // The elements that build up this node. Can contain other `IParseTreeNode`s.
    public IReadOnlyList<object?> Children { get; }
}

Generating the parse-tree returning parser methods is not expensive or hard, I don't think there needs to be an option. An open question is, how to tell the parser, that we want the parse-tree? Something like ParseXXX_ParseTree feels kind of sketchy to me.

The transformer type generation should be explicit, as that is an extra type that the user will get to use. Annotating the parser with something like [ParseTreeTransformer(ClassName = "ParseTreeToAst")] should be fine, I would not like to put something like this into the parser itself.

If the user didn't need the AST, the method signatures would still prove to be useful (to know how to store the elements), we could accept partial method signatures without a body. In that case, the parser would only provide the parse-tree alternatives.

Additional context
Some tokens, that are usually ignored when constructing the AST become essential in the parse-tree. One example would be the comment token. It should be part of the parse-tree, but we really don't want to deal with it in the AST. Somehow, we would need to annotate to only consider some tokens, when parsing into a parse-tree. I believe custom matchers wouldn't be too bad of an idea here.