str_concat benchmark
artemru opened this issue · comments
Is there some benchmark on str_concat
operation ?
On my local machine I've tried a naive python implem and got better result than with NumbaStringArray
:
import numpy as np
import pyarrow as pa
from fletcher._numba_compat import NumbaStringArray, buffers_as_arrays
from fletcher._algorithms import str_concat
a1 = pa.array(np.random.rand(10**6).astype(str).astype('O'))
a2 = pa.array(np.random.rand(10**6).astype(str).astype('O'))
%timeit pa.array([x + y for x, y in zip(a1.to_pandas(), a2.to_pandas())])
# 860 ms ± 6.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit str_concat(NumbaStringArray.make(a1), NumbaStringArray.make(a2))
# 1.11 s ± 14.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Is it something that you expect ?
No, this is slower than expected. I have removed the cited code in #100 and provided a better implementation that gives at least on my machine a 5x speedup.