Is the fully connected layer faster than using the OpenBLAS equivalent?
RuABraun opened this issue · comments
Is the fully connected layer faster than using the OpenBLAS equivalent?
No it's a lot slower unfortunately when the input/output has multiple rows. Only a bit faster for the single row case. Probably should just use Arm ComputeLibrary.
I suggest you take a look at XNNPACK library, which is a successor to NNPACK.
Oh nice! Great I'll try it out. :)