jsinger67 / parol

LL(k) and LALR(1) parser generator for Rust

Home Page:https://jsinger67.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Smarter type generation in auto generation - Part 2 Omit struct generation for a production type if its right-hand side is a single item.

jsinger67 opened this issue · comments

Smarter type generation in auto generation - Part 2 Omit struct generation for a production type if its right-hand side is a single item.

I've noticed that If this is resolved, we can remove redundant field accesses like the following example.

Given grammar:

Expr: Item | ...;
Item: /.+/;

parol will generate something like:

enum List {
    Item(ExprItem),
    ...
}
struct ListItem {
    item: Item,
}
struct Item {
    item: Token
}

Then, a user needs the following code to access the token.

let expr_item: ExprItem = ...;
// .item.item is redundant.
println!("{}", expr_item.item.item.text());

Also, this is just a side node, but I prefer token over item for field name like:

struct Item {
    token: Token
}

The following tells me twice, "Token is an [Ii]tem".

struct Item {
     item: Token
}

The token one also tells me twice, "The field is a [tT]oken", but it seems acceptable and consistent.

Also, we can consider this, but I don't think this is an option.

struct Item(Token);

I'm currently working on this but it turns out that this is not easily done.
With the current structure it shows tendencies to blowing up the code for traits generation in a ugly way.
At this point I would resort to a deeper analysis of the problem to find a better and more maintainable solution.

Your side note about naming is maybe self-made.
The name of the terminal is taken from the non-terminal if it has a separate production.

Item: /.+/;

This results in the name Item. Token is the type from parol_runtime.

But correct me if I'm wrong.

I'm currently working on this but it turns out that this is not easily done.
With the current structure it shows tendencies to blowing up the code for traits generation in a ugly way.
At this point I would resort to a deeper analysis of the problem to find a better and more maintainable solution.

I also think we should wait for a maintainable solution to come out.

Your side note about naming is maybe self-made.
The name of the terminal is taken from the non-terminal if it has a separate production.

It feels inconsistent with Rust-style API if a field name describes the container struct rather than the field itself.

I think this is fine.

///
/// Type derived for production 155
///
/// SimpleExprOptGroup: "\+";
///
#[allow(dead_code)]
#[derive(Builder, Debug, Clone)]
#[builder(crate = "parol_runtime::derive_builder")]
pub struct SimpleExprOptGroupPlus<'t> {
    pub plus: Token<'t>, /* \+ */
}

And this is still fine:

///
/// Type derived for production 216
///
/// Number: Integer;
///
#[allow(dead_code)]
#[derive(Builder, Debug, Clone)]
#[builder(crate = "parol_runtime::derive_builder")]
pub struct NumberInteger<'t> {
    pub integer: Box<Integer<'t>>,
}

But this seems inconsistent:

///
/// Type derived for non-terminal Integer
///
#[allow(dead_code)]
#[derive(Builder, Debug, Clone)]
#[builder(crate = "parol_runtime::derive_builder")]
pub struct Integer<'t> {
    pub integer: Token<'t>, /* [0-9][0-9]*|[0-9][0-9A-F]*H */
}

To avoid the behavior, I should stop to use the so-called "primary non-terminal for a terminal", but then I will also lose control of field names that refers to the terminal.

Won't fix currently

I tried hard to get a satisfying solution for this problem but the structure of the generated
grammar type system and the way parol implements the internal adapter that maps a typed user grammar
around the untyped grammar symbols is only possible when some constraints are always adhered.

When you suppress a concrete type of production (i.e. its "output type" that is pushed on the parse
stack) you have to change the handling of this production in all places where the result of it is
popped from the stack later on.

Also some handling of how recursive constructs are implicitly converted into sequences rely firmly
on a production type having a certain structure.

In a similar situation are those parts of parol that deduce that pairs of production belong to the
two special forms of a single non-terminal - the None and the Some part.

More on this can be found here and here.

When you don't want to generate a correct functioning parser and only are interested in generated
ATS types this might be a feasible task to be accomplished. In this situation you don't have to
take care for a correct functioning parser.