rapidfuzz / RapidFuzz

Rapid fuzzy string matching in Python using various string metrics

Home Page:https://rapidfuzz.github.io/RapidFuzz/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Different results on Windows and Linux? Linux didn't supported?

OAE69 opened this issue · comments

I run same code on pycharm and Linux, but I get different results,
python:
from rapidfuzz import fuzz

score= fuzz.token_set_ratio("It is an apple", "It is an apple juice")
print(score)

In pycharm, i get 100,
In Linux, i get 97,
the version of python and rapidfuzz is same

I can't reproduce this on my machine. For me this gives 100 both on Windows and Linux.

So to fix this I would need your help in running some tests on your machine:

  1. I assume the result is reproducible for you
  2. Can you try:
git clone --recursive https://github.com/rapidfuzz/rapidfuzz.git
cd rapidfuzz
pip install . -v

and then try again. This is simply to validate whether a locally built version shows the same problems.

  1. if 2) still shows the problems, I can create a patched version of the library which includes debug prints to get to the bottom of the issue. If it doesn't occur in 2) I will have to think about what we could do.

Since my company cannot download package from online,
these is the version:
thefuzz 0.20.0
rapidfuzz 3.4.0
same version on windows and linux, but still get different results,
pycharm encoding is utf-8, linux encoding is en_us.utf-8

Ah that explains your issue. There are two problems for you:

  1. you are using the Python fallback version. Probably because you installed the package from source without a C++ compiler present. You can see whats going wrong when increasing the verbosity of the build. The pure Python fallback version works, but is quite a bit slower.
  2. There was a bug in the Python fallback implementation of fuzz.token_set_ratio that was fixed in version 3.6.0.

Thank you very much!
rapidfuzz 3.6.0 fixed this problem.