Returned span is nondeterministic in some cases
jelmerdus opened this issue · comments
Describe the bug
In some cases, the returned span changes when the program is started multiple times. Within one run, the results are always the same.
To Reproduce
Run this program 10 times. The result will be [24,25] some times and [24,27] other times.
from quantulum3 import parser
matches = parser.parse("CAVO TRECCIA MARRONE MM 3 x")
print(matches[0].span)
Expected behavior
When using quantulum in a larger piece of software, nondeterministic behavior makes it almost impossible to debug. It is much better to be consistently wrong in some cases than to be nondeterministc.
Screenshots
Additional information:
- Python Version: Python 3.10, sklearn 1.2.2
- Classifier activated/ sklearn installed: no/yes
- OS: Windows 10
- Version 0.7.9 and 0.9.0
Additional context
Add any other context about the problem here.
Thanks for reporting this! Nondeterministic behaviour is definitely not desireable. Based on what you reported it's likely due to hash-based dictionary ordering not being fully deterministic.