Returned span is nondeterministic in some cases

Question

Returned span is nondeterministic in some cases

jelmerdus opened this issue a year ago · comments

Describe the bug
In some cases, the returned span changes when the program is started multiple times. Within one run, the results are always the same.

To Reproduce
Run this program 10 times. The result will be [24,25] some times and [24,27] other times.

from quantulum3 import parser
matches = parser.parse("CAVO TRECCIA MARRONE MM 3 x")
print(matches[0].span)

Expected behavior
When using quantulum in a larger piece of software, nondeterministic behavior makes it almost impossible to debug. It is much better to be consistently wrong in some cases than to be nondeterministc.

Screenshots

Additional information:

Python Version: Python 3.10, sklearn 1.2.2
Classifier activated/ sklearn installed: no/yes
OS: Windows 10
Version 0.7.9 and 0.9.0

Additional context
Add any other context about the problem here.

Niels Mündler · Answer 1 · Wed Jun 21 2023 17:24:06 GMT+0800 (China Standard Time)

Thanks for reporting this! Nondeterministic behaviour is definitely not desireable. Based on what you reported it's likely due to hash-based dictionary ordering not being fully deterministic.