k_aligned & memory requirements
mpekalski opened this issue · comments
-
It would be useful to mention in the README that memory allocation depends on k_aligned, not just k. So changing k from 4 to 5 actually doubles memory requirements.
-
Is there any particular reason why you align k to the power of 2?
Perhaps you already know: memory needs to be aligned on 16 byte boundaries to efficiently use the SSE instructions. The authors implemented the W_f_j_k multidimensional array as a flat 1d one, so if you want all the latent vectors to be on a 16 byte boundary , you need the memory used by each vector to be a multiple of 16. For floats, which are usually 4 bytes long, that means k_aligned has to be a multiple of 4.
I don't see any obvious advantage of using k=2**x ...
Yes, k_aligned need to be a multiple of 4 and this is because of the use of SSE instructions. We will consider to have some document about this. Thanks for feedback.