Altivec option is misleading, it should be VSX; also, why only ppc64le, but not ppc64 (for supported CPUs)?

Question

Altivec option is misleading, it should be VSX; also, why only ppc64le, but not ppc64 (for supported CPUs)?

barracuda156 opened this issue 4 months ago · comments

What CMakeLists call Altivec is in fact VSX, which is a later ISA. It is misleading to use Altivec name.

This is on a system where Altivec is supported (but VSX is not):

/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_archivers_blosc2/blosc2/work/c-blosc2-2.13.2/blosc/shuffle-altivec.c:41:9: error: '__builtin_vsx_stxvw4x_v16qi' requires the '-mvsx' option
   41 |         vec_xst(xmm0[i], j + i * total_elements, dest);
      |         ^~~~~~~
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_archivers_blosc2/blosc2/work/c-blosc2-2.13.2/blosc/shuffle-altivec.c:41:9: note: overloaded builtin '__builtin_vec_vsx_st' is implemented by builtin '__builtin_vsx_stxvw4x_v16qi'
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_archivers_blosc2/blosc2/work/c-blosc2-2.13.2/blosc/shuffle-altivec.c: In function 'shuffle4_altivec':
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_archivers_blosc2/blosc2/work/c-blosc2-2.13.2/blosc/shuffle-altivec.c:57:7: error: '__builtin_vsx_lxvw4x_v16qi' requires the '-mvsx' option
   57 |       xmm0[i] = vec_xl(bytesoftype * j + 16 * i, src);
      |       ^~~~
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_archivers_blosc2/blosc2/work/c-blosc2-2.13.2/blosc/shuffle-altivec.c:57:7: note: overloaded builtin '__builtin_vec_vsx_ld' is implemented by builtin '__builtin_vsx_lxvw4x_v16qi'
/opt/local/var/macports/build/_opt_PPCSnowLeopardPorts_archivers_blosc2/blosc2/work/c-blosc2-2.13.2/blosc/shuffle-altivec.c:65:9: error: '__builtin_vsx_stxvw4x_v16qi' requires the '-mvsx' option
   65 |         vec_xst(xmm0[i], j + i*total_elements, dest);
      |         ^~~~~~~

The code explicitly uses VSX built-ins.

Is there some reason why VSX are allowed only for little-endian version (ppc64le)? Big-endian ppc64 supports VSX insns, though starting from ISA 2.06. (So it should not be enabled without a check, of course.)
If it is undesirable to have a check for supported ISA, then it can be done via non-default configure option. So that those who use VSX-capable POWER hardware in big-endian mode could benefit from hardware capabilities.

Sergey Fedorov · Answer 1 · Tue Mar 19 2024 06:41:26 GMT+0800 (China Standard Time)

By the way, it is possible to support actual Altivec as a fallback? I.e. ISA 2.02.

Francesc Alted · Answer 2 · Tue Mar 19 2024 15:07:49 GMT+0800 (China Standard Time)

IIRC @kif is the author of the VSX code. He might shed some light on this.

Francesc Alted · Answer 3 · Sat Jun 08 2024 00:50:30 GMT+0800 (China Standard Time)

Closing due to inactivity.

Sergey Fedorov · Answer 4 · Sat Jun 08 2024 04:00:58 GMT+0800 (China Standard Time)

@kif Any update on this?

Jerome Kieffer · Answer 5 · Sat Jun 08 2024 14:18:05 GMT+0800 (China Standard Time)

Sorry for the delay. I was not aware that VSX had more instruction than Altivec (actually VMX). I thought it was just more registers. So I do agree the test should be on the presence of the VSX instruction and not on the VMX. The name of the files should be changed as well.
One can get inspiration from:
https://bugzilla.mozilla.org/show_bug.cgi?id=1629414

While I have access to a Power9, I have not access to elder BigEndian version of those computers. One should re-open this issue.

Sergey Fedorov · Answer 6 · Sat Jun 08 2024 14:37:17 GMT+0800 (China Standard Time)

@kif If you or someone could propose AltiVec-compatible fallback, I can test it locally. (Unfortunately, I cannot write this kind of code myself.)

While I have access to a Power9, I have not access to elder BigEndian version of those computers. One should re-open this issue.

All Power cpus are bi-endian in fact, and perhaps you could also virtualize Big-endian system on a Little-endian host without loss of speed.
This won’t help on its own with earlier ISA compatibility, but it should allow to test the code for modern Big-endian systems (OpenBSD and FreeBSD run on Power9, AFAIK).

Jerome Kieffer · Answer 7 · Sat Jun 08 2024 15:46:21 GMT+0800 (China Standard Time)

I did that a long time ago and debugged it on the architecture I had access to (Power9).
I guess the compilation would have gone through if the instruction would have had been available.
All this part of code is re-shuffling bytes/bits. If those instruction are not available, the code should silently fall back on the pure C implementation.
Since it is not, one should just tidy up the code and check for the presence of this VSX variable.

About the emulation of BE on LE machine, I don't think power9 or ARM (both bi-endian) have any advantage in comparison to pure little-endian processor like x86 ... but maybe I am wrong.

Sergey Fedorov · Answer 8 · Sat Jun 08 2024 15:54:17 GMT+0800 (China Standard Time)

About the emulation of BE on LE machine, I don't think power9 or ARM (both bi-endian) have any advantage in comparison to pure little-endian processor like x86

Power probably has, though it is not something relevant for me (nothing beyond G5 hardware available here), so I am not too sure.
There is some info from TFF developer: https://www.talospace.com/2018/08/making-your-talos-ii-into-power-mac.html