Here's matrix multiply optimization examples.
- Cache
- OpenMP
- intel AVX
- AVX + LoopUnroll
- AVX + LoopUnroll + Cache
- AVX + LoopUnroll + Cache + OpenMP
- nVidia Cuda
Requires to run.
- cuda-toolkit v8+
- OpenMP v4+
> cd avx
> make
> ./bin/matrix
Here's optimizing matrix multiplication examples.
Here's matrix multiply optimization examples.
Requires to run.
> cd avx
> make
> ./bin/matrix