erezsh / plyplus

a friendly yet powerful LR-parser written in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Can't use regular expressions

Robert42 opened this issue · comments

The following script is running without problem

import plyplus

grammar = plyplus.Grammar("""
start: a another_a;

a: 'a';
another_a: 'a';
""")

print(grammar.parse('aa'))

After a slight modification (replace another_a: 'a'; with another_a: '[a]';), the script can't parse the target anymore:

import plyplus

grammar = plyplus.Grammar("""
start: a another_a;

a: 'a';
another_a: '[a]';
""")

print(grammar.parse('aa'))

If I didn't completely failed at understanding the grammar notation, this slight modification should be also runnable.

Environment

  • Linux Mint 18
  • PlyPlus Version: 0.7.5
  • Installed via pip

Hi Robert,

Plyplus has trouble with this grammar because of token collision. The two tokens 'a' and '[a]' match the exact same input (namely, 'a'). That means only the token with the highest priority will be recognized by the lexer (in this case: '[a']).

The error you get while parsing is because Plyplus expected the token 'a', but got the token '[a]'.

If you change it to '[b]' in the grammar, and input 'ab', you will see that it works as intended.

In general, it's recommended to avoid overlapping tokens, since it can produce confusing errors. But sometimes it's useful, like when matching both '+' and '++'. In general, the longest token always takes priority.

If you have overlapping tokens in your grammar, and can't find an obvious workaround, let me know. I can suggest several methods to solve it.

Thank you for your detailed and helpful answer :)