Remove the memcopies of the vectors
Kerollmops opened this issue · comments
It is the part that takes up to 22% to copy, 5% to bzero the vector in advance, and again 5% to drop the allocated vectors. It seems like the AVX implementation can be switched to non-aligned f32 slices.