Can't use regular expressions
Robert42 opened this issue · comments
The following script is running without problem
import plyplus
grammar = plyplus.Grammar("""
start: a another_a;
a: 'a';
another_a: 'a';
""")
print(grammar.parse('aa'))
After a slight modification (replace another_a: 'a';
with another_a: '[a]';
), the script can't parse the target anymore:
import plyplus
grammar = plyplus.Grammar("""
start: a another_a;
a: 'a';
another_a: '[a]';
""")
print(grammar.parse('aa'))
If I didn't completely failed at understanding the grammar notation, this slight modification should be also runnable.
Environment
- Linux Mint 18
- PlyPlus Version: 0.7.5
- Installed via pip
Hi Robert,
Plyplus has trouble with this grammar because of token collision. The two tokens 'a' and '[a]' match the exact same input (namely, 'a'). That means only the token with the highest priority will be recognized by the lexer (in this case: '[a']).
The error you get while parsing is because Plyplus expected the token 'a', but got the token '[a]'.
If you change it to '[b]' in the grammar, and input 'ab', you will see that it works as intended.
In general, it's recommended to avoid overlapping tokens, since it can produce confusing errors. But sometimes it's useful, like when matching both '+' and '++'. In general, the longest token always takes priority.
If you have overlapping tokens in your grammar, and can't find an obvious workaround, let me know. I can suggest several methods to solve it.
Thank you for your detailed and helpful answer :)