rapidfuzz / RapidFuzz

Rapid fuzzy string matching in Python using various string metrics

Home Page:https://rapidfuzz.github.io/RapidFuzz/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fuzzy matching a dict with key to a list of keywords

ayushb3 opened this issue · comments

I am trying to do fuzzy matching on a dictionary which has a key attached to multiple keywords.

  1. I tried combining the keywords into one large string and using partial_ratio, however, the results from this were not always as accurate as I would have liked.
    image
    With a search of 'italy', the user would be looking for 'italian ancestry' but this algorithm ranks it equal to 'hospital' because they both have 'ital'.

  2. I tried first iterating through the dict, then passing the keywords as a list as the choices paramater for the extract method. This allowed me to use WRatio comparing each individual keyword to the query. While this presented somewhat more accurate, it was much, much slower; I think about 10x slower. This also makes it much harder to sort and get the improved results.

image

Do you have any better implementations or ideas I could pursue to optimize both accuracy and efficiency? Thank you!

Do you have any better implementations or ideas I could pursue to optimize both accuracy and efficiency?

No I do not have any better ideas on how to implement this.