Can't use regular expressions

Question

Can't use regular expressions

Robert42 opened this issue 8 years ago · comments

Robert Hildebrandt commented 8 years ago

The following script is running without problem

import plyplus

grammar = plyplus.Grammar("""
start: a another_a;

a: 'a';
another_a: 'a';
""")

print(grammar.parse('aa'))

After a slight modification (replace another_a: 'a'; with another_a: '[a]';), the script can't parse the target anymore:

import plyplus

grammar = plyplus.Grammar("""
start: a another_a;

a: 'a';
another_a: '[a]';
""")

print(grammar.parse('aa'))

If I didn't completely failed at understanding the grammar notation, this slight modification should be also runnable.

Environment

Linux Mint 18
PlyPlus Version: 0.7.5
Installed via pip

Erez Shinan · Answer 1 · Wed Mar 01 2017 04:58:28 GMT+0800 (China Standard Time)

Hi Robert,

Plyplus has trouble with this grammar because of token collision. The two tokens 'a' and '[a]' match the exact same input (namely, 'a'). That means only the token with the highest priority will be recognized by the lexer (in this case: '[a']).

The error you get while parsing is because Plyplus expected the token 'a', but got the token '[a]'.

If you change it to '[b]' in the grammar, and input 'ab', you will see that it works as intended.

In general, it's recommended to avoid overlapping tokens, since it can produce confusing errors. But sometimes it's useful, like when matching both '+' and '++'. In general, the longest token always takes priority.

If you have overlapping tokens in your grammar, and can't find an obvious workaround, let me know. I can suggest several methods to solve it.

Robert Hildebrandt · Answer 2 · Wed Mar 01 2017 07:07:45 GMT+0800 (China Standard Time)

Thank you for your detailed and helpful answer :)