goodmami / pe

Fastest general-purpose parsing library for Python with a familiar API

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Left recursion

goodmami opened this issue · comments

So far I have not seen a need for left recursion, particularly because pe emits values in a flat list. But some people seem to really care about it, so this issue is for tracking its implementation.

Using the memo to escape the left-recursive loop looks promising, and this method was chosen for Python's new pegen parser. Guido wrote about it here: https://medium.com/@gvanrossum_83706/left-recursive-peg-grammars-65dab3c580e1

Not sure how applicable this is to PEG but ANTLR can deal with direct left-recursion very gracefully and from its docs I quote

The most natural expression of some common language constructs is left recursive. For example C declarators and arithmetic expressions.

which is why it's always a very handy feature for a parser generator to support at least direct left recursion.

See also the corresponding publication. From there I quote

ANTLR 4 generates ALL(*) parsers and supports direct left-recursion through grammar rewriting

so maybe a similar automatic grammar-rewriting strategy could be used by pe as well?

Thanks for the links. I haven't seen that paper yet, and I'm curious how the ALL(*) parsing works. I think that some limited grammar rewriting is possible to avoid trivial left-recursion cases in the same way that pe's "common" optimizations do grammar rewriting, such as transforming patterns like "a" "b" into "ab" or "a" / "b" into [ab].

If you continue the section you quoted, they say this:

Direct left-recursion covers the most common cases, such as arithmetic expression productions, like E → E . id, and C declarators. We made an engineering decision not to support indirect or hidden left-recursion.

So they do not handle the more difficult left-recursion cases, which sounds like a practical decision (especially since they are apparently doing grammar analysis at parse time instead of compile time).