drmfinlay / pyjsgf

JSpeech Grammar Format (JSGF) compiler, matcher and parser package for Python.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Matching process overhaul

drmfinlay opened this issue · comments

The rule expansion matching process could use an overhaul. Sometimes it doesn't match when it should and it still doesn't scale well. I'm going to try reverting back to using regular expressions or the regex package internally. It needs to work in such a way that the Expansion.current_match values are still properly populated. Current match values are used for functionality such as the Rule.matched_tags property.

I'm planning to make these changes in v1.4.1.

The new matching implementation has been committed to the develop branch (commit 24ffaef). It will be released in v1.5.0 soon. I noted these changes in the commit message, but I'll list them here too:

  • Fixed a few bugs where certain rules wouldn't match (including #12).
  • The new implementation scales to large rules/grammars and has significantly improved performance.
  • Added Expansion.matcher_element property and Expansion.invalidate_matcher method.
  • The jsgf.ext.Dictation class works correctly in most circumstances now and also has improved performance. invalidate_matcher needs to be called manually if changing the rule expansion after matching.
  • Successfully matching speech strings to ambiguous rules is no longer supported because it is not worth the performance hit. E.g. matching "test" to public <test> = [test] test;.

For those interested in how it works, each Expansion object now has a lazily initialised pyparsing.ParserElement object for matching the expansion and any children appropriately. The elements are initialised recursively when either Expansion.matches or Expansion.matcher_element are used. If a rule expansion's parent or ChildList are modified, each ancestor's element is automatically invalidated as well as the elements of any NamedRuleRefs or RuleRefs referencing the rule. As mentioned above, Dictation expansions are not invalidated automatically yet.

I'll close this now that version 1.5.0 is released. I think I've mentioned most of the above in the documentation.