rapidfuzz / RapidFuzz

Rapid fuzzy string matching in Python using various string metrics

Home Page:https://rapidfuzz.github.io/RapidFuzz/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Request: Computing the distance/similarity for 2 lists of same length

thomasryde opened this issue · comments

Would it be possible to write a function that computes the similarity between the corresponding elements of 2 lists of equal length?

I am hoping for a ridiculous speed-up of this operation compared to doing it element by element in python. I guess this request would be like cdist() but just for corresponding elements of 2 lists instead of all combinations. Love the library.

Hi @maxbachmann

I am considering to make a PR with this and call it pdist (for pairwise distance under the process folder). Is this a feature you believe would be beneficial to the project and do you have any opinion on the location? /Thomas

With 34f1976 and 34f1976 the string conversions is down from around 130ns to around 77ns per string pair.

6c5b37f is another improvement for the matching. This reduces the runtime of the second part from 90ns down to 59ns for single threaded execution.

I think this concludes the more obvious performance improvements.