hi-primus / optimus

:truck: Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark

Home Page:https://hi-optimus.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[FEA] Implement string grouper

argenisleon opened this issue · comments

String grouper can match Levenstein performance and scale linearly #919
We need to implement this for:

  • Pandas
  • Dask
  • cudf
  • Dask-cudf

Issue-Label Bot is automatically applying the label feature_request to this issue, with a confidence of 0.97. Please mark this comment with 👍 or 👎 to give our bot feedback!

Links: app homepage, dashboard and code for this bot.

String grouper seems a good alternative to other methods(Levenstein) but it needs to be reimplemented all the engines which seems a lot of work. Closing.