mpi4py / shmem4py

Python bindings for OpenSHMEM

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AMO tests fail with Ubuntu's OpenMPI 4.1.2

mrogowski opened this issue · comments

I, L, Q, u4, u8 data types cause error in testFetchBitwise when using shmem4py with Ubuntu's OpenMPI (4.1.2):

File "/repo/test/test_amo.py", line 189, in testFetchBitwise
self.assertEqual(val, 2**i-1)

See: https://github.com/mpi4py/shmem4py/actions/runs/3967285365/jobs/6799064824

I cannot reproduce the issue with OpenMPI 4.1.2 and UCX 1.12.1 built from source.

Maybe the issue comes after my changes in 0f21f5c ?

No, the same tests fail before that change.

This issue is reproducible in C and seems to be dependent on GCC optimizations of UCX:

  • Fedora 35 rpm of OpenMPI 4.1.1 + UCX 1.11.2 works (GCC 11)
  • Fedora 36 rpm of OpenMPI 4.1.4 + UCX 1.12.0 fails (GCC 12)
  • Ubuntu 22.04 deb of OpenMPI 4.1.2 + UCX 1.12.1 fails (GCC 11)
  • Ubuntu 23.04 deb of OpenMPI 4.1.4 + UCX 1.13.1 fails (GCC 12)
  • Fedora 35 own build of OpenMPI 4.1.4 + UCX 1.13.1 works (GCC 11)
  • Fedora 36 own build of OpenMPI 4.1.4 + release build of UCX 1.13.1 fails (GCC 12)
  • Fedora 36 own build of OpenMPI 4.1.4 + release build of UCX (master/openucx/ucx@52a9394) fails (GCC 12)
  • Fedora 36 own build of OpenMPI 4.1.4 + devel build of UCX (master/openucx/ucx@52a9394) works (GCC 12)
  • Ubuntu 22.04 own build of OpenMPI 4.1.4 + UCX 1.13.1 works (GCC 11)
  • Ubuntu 23.04 own build of OpenMPI 4.1.4 + UCX 1.13.1 fails (GCC 12)
  • Ubuntu 23.04 own build of OpenMPI 4.1.4 + release build of UCX (master/openucx/ucx@52a9394) fails (GCC 12)
  • Ubuntu 23.04 own build of OpenMPI 4.1.4 + devel build of UCX (master/openucx/ucx@52a9394) works (GCC 12)

I'm using the master branch of UCX because as of UCX 1.13.1 release, devel build fails with GCC 12 (openucx/ucx#8186, openucx/ucx#8617).

I will use devel builds in CI/CD for now.

Update: It seems like the issue is somehow caused by the --disable-logging flag of the release build of UCX.