Different results on Windows and Linux? Linux didn't supported?

Question

Different results on Windows and Linux? Linux didn't supported?

OAE69 opened this issue 2 months ago · comments

OAE69 commented 2 months ago

I run same code on pycharm and Linux, but I get different results,
python:
from rapidfuzz import fuzz

score= fuzz.token_set_ratio("It is an apple", "It is an apple juice")
print(score)

In pycharm, i get 100,
In Linux, i get 97,
the version of python and rapidfuzz is same

Max Bachmann · Answer 1 · Tue Apr 23 2024 18:41:05 GMT+0800 (China Standard Time)

I can't reproduce this on my machine. For me this gives 100 both on Windows and Linux.

So to fix this I would need your help in running some tests on your machine:

I assume the result is reproducible for you
Can you try:

git clone --recursive https://github.com/rapidfuzz/rapidfuzz.git
cd rapidfuzz
pip install . -v

and then try again. This is simply to validate whether a locally built version shows the same problems.

if 2) still shows the problems, I can create a patched version of the library which includes debug prints to get to the bottom of the issue. If it doesn't occur in 2) I will have to think about what we could do.

OAE69 · Answer 2 · Wed Apr 24 2024 12:14:18 GMT+0800 (China Standard Time)

Since my company cannot download package from online,
these is the version:
thefuzz 0.20.0
rapidfuzz 3.4.0
same version on windows and linux, but still get different results,
pycharm encoding is utf-8, linux encoding is en_us.utf-8

Max Bachmann · Answer 3 · Wed Apr 24 2024 20:13:11 GMT+0800 (China Standard Time)

Ah that explains your issue. There are two problems for you:

you are using the Python fallback version. Probably because you installed the package from source without a C++ compiler present. You can see whats going wrong when increasing the verbosity of the build. The pure Python fallback version works, but is quite a bit slower.
There was a bug in the Python fallback implementation of fuzz.token_set_ratio that was fixed in version 3.6.0.

OAE69 · Answer 4 · Fri Apr 26 2024 09:05:23 GMT+0800 (China Standard Time)

Thank you very much!
rapidfuzz 3.6.0 fixed this problem.