xhochy / fletcher

Pandas ExtensionDType/Array backed by Apache Arrow

Home Page:https://fletcher.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Use Knuth-Morris-Pratt algorithm for pattern matching functions

xhochy opened this issue · comments

For even better performance, one should implement https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm. The benefit we have on columnar processing is that the initialisation of the state table is only done once, not per string. This will sadly not be yet visible in the benchmarks as they are only using a single character pattern.

Originally posted by @xhochy in #141 (comment)

We did implement that in Apache Arrow in apache/arrow#7593, so no need to replicate that here.