sourceryinstitute / OpenCoarrays

A parallel application binary interface for Fortran 2018 compilers.

Home Page:http://www.opencoarrays.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Defect: UCX warnings in CentOS

SineBell opened this issue · comments

  • I am reporting a bug others will be able to reproduce and not asking a question or requesting a new feature.

System information including:

  • OpenCoarrays Version: 2.8.0

  • Fortran Compiler: gfortran 8.3.1

  • C compiler used for building lib: gcc 8.3.1

  • Installation method: cmake from source using a git clone. Passed all tests from make test

  • All flags & options passed to the installer
    gfortran and gcc specifications FC=/path/to/gfortran8 CC=/path/to/gcc8

  • Output of uname -a: Linux debye4 4.18.0-193.14.2.el8_2.x86_64 #1 SMP Sun Jul 26 03:54:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • MPI library being used: OpenMPI 4.1.1

  • Machine architecture and number of physical cores: x86_64, 64 cores

  • Version of CMake: cmake 3.20.2

To help us debug your issue please explain:

What you were trying to do (and why)

Running any fortran code with more than 1 images.

What happened (include command output, screenshots, logs, etc.)

At the end of the execution, numerous UCX warnings are printed on screen.
E.g.

[1691070750.032589] [debye4:964906:0]      tag_match.c:61   UCX  WARN  unexpected tag-receive descriptor 0x7f5c5710dfc0 was not matched
[1691070750.032604] [debye4:964906:0]      tag_match.c:61   UCX  WARN  unexpected tag-receive descriptor 0x7f5c570fdf40 was not matched
[1691070750.032616] [debye4:964906:0]      tag_match.c:61   UCX  WARN  unexpected tag-receive descriptor 0x7f5c570edec0 was not matched
[1691070750.032650] [debye4:964905:0]      tag_match.c:61   UCX  WARN  unexpected tag-receive descriptor 0x7f5a84b00f40 was not matched

What you expected to happen

The execution appears to end successfully. The large number of warnings, however, clutters the output making difficult to read the output on screen.

Step-by-step reproduction instructions to reproduce the error/bug

Any code I tested with cafrun and -n > 1.

For example, this simple code

program bugcheck
    write(*,*) "hello by ", this_image()
end program

Compiled with caf -o bugcheck bugcheck.f90
Run with cafrun -n 2 bugcheck will output

 hello by            1
 hello by            2
[1691071152.339731] [debye4:965700:0]      tag_match.c:61   UCX  WARN  unexpected tag-receive descriptor 0x7fb352acbfc0 was not matched
[1691071152.339731] [debye4:965701:0]      tag_match.c:61   UCX  WARN  unexpected tag-receive descriptor 0x7fb0ef50efc0 was not matched

We see the same issue, checked out OpenCoarrays today, compiled with GCC 11.3.0 or GCC 8.5.0, OpenMPI 4.1.4.
When run with p images, p*(p-1) such warning messages are printed. AFAIK they are triggered by MPI_Finalize if MPI_Send's were not matched with an MPI_Recv.