Matching process overhaul
drmfinlay opened this issue · comments
The rule expansion matching process could use an overhaul. Sometimes it doesn't match when it should and it still doesn't scale well. I'm going to try reverting back to using regular expressions or the regex package internally. It needs to work in such a way that the Expansion.current_match
values are still properly populated. Current match values are used for functionality such as the Rule.matched_tags
property.
I'm planning to make these changes in v1.4.1.
The new matching implementation has been committed to the develop branch (commit 24ffaef). It will be released in v1.5.0 soon. I noted these changes in the commit message, but I'll list them here too:
- Fixed a few bugs where certain rules wouldn't match (including #12).
- The new implementation scales to large rules/grammars and has significantly improved performance.
- Added
Expansion.matcher_element
property andExpansion.invalidate_matcher
method. - The
jsgf.ext.Dictation
class works correctly in most circumstances now and also has improved performance.invalidate_matcher
needs to be called manually if changing the rule expansion after matching. - Successfully matching speech strings to ambiguous rules is no longer supported because it is not worth the performance hit. E.g. matching
"test"
topublic <test> = [test] test;
.
For those interested in how it works, each Expansion
object now has a lazily initialised pyparsing.ParserElement
object for matching the expansion and any children appropriately. The elements are initialised recursively when either Expansion.matches
or Expansion.matcher_element
are used. If a rule expansion's parent or ChildList
are modified, each ancestor's element is automatically invalidated as well as the elements of any NamedRuleRefs
or RuleRefs
referencing the rule. As mentioned above, Dictation
expansions are not invalidated automatically yet.
I'll close this now that version 1.5.0 is released. I think I've mentioned most of the above in the documentation.