hi-primus / optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Home Page:https://hi-optimus.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Explore string_grouper for fast fuzzy matching

argenisleon opened this issue · comments

Maybe needs to be ported to cudf
https://github.com/Bergvca/string_grouper

This article has a good point about Levenshtein distance and how it grows quadratic
https://bergvca.github.io/2017/10/14/super-fast-string-matching.html.

Maybe we could consider to remove it for 3.0