Non-AVX execution - SSE-only on Pentiums/Celerons?

Question

Non-AVX execution - SSE-only on Pentiums/Celerons?

ttsiodras opened this issue 2 months ago · comments

Thanassis Tsiodras commented 2 months ago

On an N5095 Celeron machine, there are no AVX extensions. I believe the same applies to most Pentiums and Celerons...

$ grep ^flags /proc/cpuinfo  | head -1 | sed 's,^.*: ,,;s, ,\n,g'  | grep sse
sse
sse2
ssse3
sse4_1
sse4_2

Is it possible to run these models with SSE instructions?

I understand that AVX gives more speed because of wider registers, but is it just this, or some need for specific AVX functionality?

Ray Skinner · Answer 1 · Mon Apr 15 2024 00:27:10 GMT+0800 (China Standard Time)

This was working on non-AVX prior to the recent update, I have the same issue as I'm running on an older xeon server. Would appreciate an updated branch for non-avx if possible!

Ray Skinner · Answer 2 · Mon Apr 15 2024 00:35:57 GMT+0800 (China Standard Time)

running a test with https://github.com/Mozilla-Ocho/llamafile/releases/download/0.7/llamafile-0.7 - ill let you know how it goes

Ray Skinner · Answer 3 · Mon Apr 15 2024 03:49:58 GMT+0800 (China Standard Time)

I have confirmed non-avx issue llamafile-0.7 --last working versionn for me was 0.62

Justine Tunney · Answer 4 · Sat Apr 20 2024 04:22:03 GMT+0800 (China Standard Time)

See cdd7458 where AVX became mandatory. llamafile 0.6.2 was the last version that supported SSSE3+. We currently lack the ability to runtime dispatch anything but a few performance critical routines (e.g. matmul) so the vast majority of code runs at the baseline ISA and if we have that be SSE then it makes things significantly slower on modern CPUs. Look at the benchmarks in my blog post https://justine.lol/matmul/ and notice how big of a disparity there was between llamafile 0.6.2 and llama.cpp. Our decision to mandate AVX played a big role in helping us catch up with llama.cpp and then surpass it.

Justine Tunney · Answer 5 · Sat Apr 20 2024 04:24:36 GMT+0800 (China Standard Time)

Part of what changed with AVX is that it introduced a new method of encoding instructions (VEX encoding) and Intel decided that they were going to penalize any code that uses the legacy SSE encodings. So in many respects SSE encoded code is radioactive to performance and there's not much we can do about that.

Ray Skinner · Answer 6 · Sat Apr 20 2024 06:21:04 GMT+0800 (China Standard Time)

thank you for the transparency! I do appreciate you keeping the older repo up as well :)