AMO tests fail with Ubuntu's OpenMPI 4.1.2
mrogowski opened this issue · comments
I
, L
, Q
, u4
, u8
data types cause error in testFetchBitwise
when using shmem4py with Ubuntu's OpenMPI (4.1.2):
File "/repo/test/test_amo.py", line 189, in testFetchBitwise
self.assertEqual(val, 2**i-1)
See: https://github.com/mpi4py/shmem4py/actions/runs/3967285365/jobs/6799064824
I cannot reproduce the issue with OpenMPI 4.1.2 and UCX 1.12.1 built from source.
Maybe the issue comes after my changes in 0f21f5c ?
No, the same tests fail before that change.
This issue is reproducible in C and seems to be dependent on GCC optimizations of UCX:
- Fedora 35 rpm of OpenMPI 4.1.1 + UCX 1.11.2 works (GCC 11)
- Fedora 36 rpm of OpenMPI 4.1.4 + UCX 1.12.0 fails (GCC 12)
- Ubuntu 22.04 deb of OpenMPI 4.1.2 + UCX 1.12.1 fails (GCC 11)
- Ubuntu 23.04 deb of OpenMPI 4.1.4 + UCX 1.13.1 fails (GCC 12)
- Fedora 35 own build of OpenMPI 4.1.4 + UCX 1.13.1 works (GCC 11)
- Fedora 36 own build of OpenMPI 4.1.4 + release build of UCX 1.13.1 fails (GCC 12)
- Fedora 36 own build of OpenMPI 4.1.4 + release build of UCX (master/openucx/ucx@52a9394) fails (GCC 12)
- Fedora 36 own build of OpenMPI 4.1.4 + devel build of UCX (master/openucx/ucx@52a9394) works (GCC 12)
- Ubuntu 22.04 own build of OpenMPI 4.1.4 + UCX 1.13.1 works (GCC 11)
- Ubuntu 23.04 own build of OpenMPI 4.1.4 + UCX 1.13.1 fails (GCC 12)
- Ubuntu 23.04 own build of OpenMPI 4.1.4 + release build of UCX (master/openucx/ucx@52a9394) fails (GCC 12)
- Ubuntu 23.04 own build of OpenMPI 4.1.4 + devel build of UCX (master/openucx/ucx@52a9394) works (GCC 12)
I'm using the master branch of UCX because as of UCX 1.13.1 release, devel build fails with GCC 12 (openucx/ucx#8186, openucx/ucx#8617).
I will use devel builds in CI/CD for now.
Update: It seems like the issue is somehow caused by the --disable-logging
flag of the release build of UCX.