`process.extractBests` and usage of `__str__`
banagale opened this issue · comments
I am trying to drop-in replace a project that depends on the last version of fuzzywuzzy
prior to the name change. This is needed after hitting this issue.
The project uses process.extractBests
. I noticed that rapidfuzz
does not include process.extractBests
.
Is process.extract
a drop in replacement for that old function?
I tried using process.extract
and realized that the project was relying on the __str__
of objects passed into the choices
argument being read. Later in the code, the variable is used like an object. (this allowed the dev to easily use the object and refer to the string for comparison)
rapidfuzz
does not seem to look at a given __str__
for an object. Is this on purpose? Or perhaps FW should not have done this?
I mention the two above because I believe the goal is for the FW api to be fully available in RF. I do not know if the above use of the FW api was unusual or an anti-pattern though.
In fuzzywuzzy
there is both extract
and extractBests
with the difference that extractBests
has an additional score_cutoff
parameter. In RapidFuzz
I only have the extract
function which does provide the score_cutoff
argument and so is equivalent to extractBests
There are a couple of differences between RapidFuzz
and fuzzywuzzy
. In your specific case I assume you are using a function like WRatio
which defaults to force_ascii=True
. So your strings are preprocessed using utils.full_process(, force_ascii=True)
which runs str(sequence)
. This behaviour is not supported in rapidfuzz
, so you will need to perform this conversion yourself. This can be done e.g. like this:
process.extract(query, choices, processor=str)
or in case you want to use the preprocessing function:
def preprocess(seq):
return utils.default_process(str(seq))
process.extract(query, choices, processor=preprocess)