drmfinlay / pyjsgf

JSpeech Grammar Format (JSGF) compiler, matcher and parser package for Python.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Strange precedence when parsing |

dlukes opened this issue · comments

Consider this script:

import jsgf
grammar = jsgf.parser.parse_grammar_string("""
#JSGF V1.0 utf-8 en;
grammar main;
public <main> =
foo (bar|baz) qux | xxx;
""")
print(grammar.get_rule("main").compile())

I would expect the output to be public <main> = foo (bar|baz) qux|xxx;, or possibly something along the lines public <main> = (foo (bar|baz) qux)|(xxx); with additional parentheses to disambiguate.

Instead, I get public <main> = foo (bar|baz) (qux|xxx);. Is this expected behavior? I'm not an expert by far on JSGF, but going by the spec, | should have the lowest precedence of all, which doesn't seem compatible with this result.

Hello @dlukes,

You are right, that is not expected behaviour. I'll try to fix this in the next version. I can't say when that will be out, however. There are currently many problems with this library. I'll see what I can do.

No worries and no rush, I just thought I'd let you know :)

I just spent a few ours trying to find the issue with my code until I realized this same bug in the library. Here is another example and some workaround that might help to find the issue:

#JSGF V1.0 UTF-8 en;

grammar dialog;

<set_a> = (A1|A2);
<set_b> = (B1|B2);

<sentences> =
hello|
use <set_a> with <set_a>|
(start|stop) <set_a>|
test|
exit;

public <command> = <sentences> [please];

When parsing this string it will incorrectly assign 'test' and 'exit' to a sequence with (start|stop).
To workaround this you can replace (start|stop) with something like <controls> and add <controls> = (start|stop); above.

Thank you for the example and workaround, @fquirin. I will try to fix this problem in the next version.