π String Super-Issue
xhochy opened this issue Β· comments
We want to support strings (UTF-8 encoded) as fast as possible inside of pandas
. Therefore we need to implement several things. This will be split into many issues and hard to track just with the issue search, so we will list them all here.
We try to add the functionality in three stages:
- Implement the functionality using plain Python operations. This will be the same speed as with
pandas.StringDtype
but already provides the API tofletcher
users. This will allow us to add faster implementations bit-by-bit while already providing a fully usable library.
a) Also ensure that we have benchmarks setup to compare thepandas/object
implementation to ours. - Given the algorithm isn't too complicated, we try to make an efficient implementation with
numba
. This will allow us to provide a fast algorithm with less implementation overhead then adding it to Apache Arrow. - For all methods, add an efficient implementation to Apache Arrow if there is none yet.
This project has been archived as development has ceased around 2021.
With the support of Apache Arrow-backed extension arrays in pandas
, the major goal of this project has been fulfilled.