chokkan / liblbfgs

libLBFGS: a library of Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS)

Home Page:http://www.chokkan.org/software/liblbfgs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Other SIMD types

peastman opened this issue · comments

Do you have interest in supporting other SIMD types? I'm particularly interested in AVX, for modern x86 processors, and NEON, for ARM processors.

I would be willing to implement this, if it's something you want. It would be useful to me, and the amount of code for each one looks small.

Another possibility is to create an implementation using the portable vector extension in clang and gcc (not VisualStudio though). You can write a single implementation that automatically works on all architectures and uses whatever vector instructions are available.

I wrote an implementation with clang/gcc portable vectors. It gives a nice speedup on my ARM Mac.

After benchmarking on a couple of computers, I concluded that the only routine that should be explicitly vectorized is vecdot(). All the others are simple enough that the compiler can vectorize them automatically. The handwritten routines never help performance, and sometimes hurt. On modern x86 processors, the compiler can generate AVX code that is faster than the handwritten SSE code.

Try compiling with -DUSE_SSE -mavx and measure the speed. Now replace all the SSE routines except the dot product ones with the ANSI versions. It gets slightly faster.

I can create a PR with these changes, if you want.