Fix failing unit test for market name

Question

Fix failing unit test for market name

mre opened this issue 5 years ago · comments

In #23 (comment), @kiwita88 found the likely reason why our unit test for the market name 'p e n ny' fails. We should fix that.

Ray · Answer 1 · Sat Sep 07 2019 20:32:09 GMT+0800 (China Standard Time)

difflib is what is currently in use for this buggy feature. The diffing algorithm used by difflib is called Ratcliff-Obershelp and seems to be generic in regards to data type (binary data, strings, etc.). There are better algorithms for determining fuzzy string similarity such as Levenshtein. I believe switching algorithms is the best solution here.

What do you think, @mre? I could be convinced to write up a PR if no other contributor can. If you're comfortable adding a dependency, it might make sense to lean on https://github.com/seatgeek/fuzzywuzzy for this too.

Ray · Answer 2 · Sat Sep 07 2019 22:24:20 GMT+0800 (China Standard Time)

Update: the existing packages I mentioned are GPLv2 licensed which may not be desired so perhaps just a direct implementation of the Levenshtein algorithm could be added for this feature. Plenty of inspiration is available.

Matthias Endler · Answer 3 · Sun Sep 08 2019 11:52:21 GMT+0800 (China Standard Time)

Hey @rayrr,
thanks for your input. Yes, switching to Levenshtein would be worth a try. Whether we use a library or not doesn't matter to me. Also GPLv2 is fine in my book.
So if you like and you find the time, please go ahead and whip up a PR for this. 👍