seanyoung / lrpeg

Left Recursive PEG for rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Idea: Support `node_tag` and `branch_tag`

oovm opened this issue · comments

Add two fields to identify nodes

#[derive(Clone, Debug)]
pub struct Node<'i> {
    pub rule: Rule,
    pub start: usize,
    pub end: usize,
    pub node_tag: Option<&'i str>,   // The lifetime is the same as the input text
    pub branch_tag: Option<&'i str>, // The lifetime is the same as the input text
    pub children: Vec<Node<'i>>,
    pub alternative: Option<u16>,
}

Basically like this:

expr <-
    "(" expr ")"            #Priority
  / lhs=expr "*" rhs=expr   #Mul
  / lhs=expr "/" rhs=expr   #Div
  / lhs=expr "+" rhs=expr   #Add
  / lhs=expr "-" rhs=expr   #Sub
  / num                     #Atom
  ;
num <- re#[0-9]+#;

PEG.js has this feature

This is a great idea, thank you!

So I get that you wish to label a node. What does the branch_tag for?

I think the life-time can be 'static because we can simply use const static str for this.

I have an actual usage example here

Consider the grammar

epxr <- 
     "(" expr ")"     #Priority
    / expr "<-" expr  #Mark
    // ...others

I marked the branch_tag here

https://github.com/ygg-lang/yggdrasil-rs/blob/82cfeb8db1c96d42d4d006e7d19ca010f77942c8/projects/ygg-bootstrap/src/cst/parse.rs#L187-L198

A macro is used here, and it looks like this after expansion:

#[inline]
pub fn expr(s: RuleState) -> RuleResult {
    let s = match s.rule(Rule::BRANCH, self::__aux_expr_priority) {
        Ok(o) => return o.tag_branch("Priority"),
        Err(e) => e,
    };
    let s = match s.rule(Rule::BRANCH, self::__aux_expr_mark) {
        Ok(o) => return o.tag_branch("Mark"),
        Err(e) => e,
    };
    /// ...others 
    return Err(s);
}

Finally deal with branch_tag here

https://github.com/ygg-lang/yggdrasil-rs/blob/82cfeb8db1c96d42d4d006e7d19ca010f77942c8/projects/ygg-bootstrap/src/ast/parse/mod.rs#L62-L91

You are right, it should be Option<&'static str>

👍 Right, that is pretty nice.

I was thinking the start/end in the node could be replaced with a `&str' (with the same lifetime as the input string). I suspect this used the same amount of memory, but with a much better devx.

I've given this some thought. I think using special comments is not a great way of doing this. Also, we want to be able to mark expressions as "create nodes for this". With that in mind I've taken some inspiration from lalrpop and I've come up with this synax:

start <- (<foo> / bar) EOI;

foo <-
     add:/ <left:foo> "+" <right:num>
     sub:/ <left:foo> "-" <right:num>
     num;

bar <- "NO";

num <- re#\d+#;

So the idea is:

  • Only create nodes for expressions surrounded by < and >
  • Nodes with < label : expression > get nodes with a label
  • Sequences alternatives can have labes with label :/

I've pushed the changes which add the labels for nodes < .. > and alternatives label :/.

At the moment, all nodes are still being created. That the next step.