MPI communicator split failures
DJDavies2 opened this issue · comments
What happened?
I am getting failures of this type:
Completed case 0: Test MPI Communicator Split
0 tests failed out of 1.
Completed case 0: Test MPI Communicator Split
0 tests failed out of 1.
Completed case 0: Test MPI Communicator Split
0 tests failed out of 1.
Completed case 0: Test MPI Communicator Split
0 tests failed out of 1.
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 16084 RUNNING AT expspicesrv053
= EXIT CODE: 9
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
Tests that produce this error are e.g. eckit_test_mpi_splitcomm, eckit_test_mpi_group or eckit_test_mpi_internal_access.
What are the steps to reproduce the bug?
Build and run ctests. It seems that the problems occur with mpich but not with openmpi.
Version
develop
Platform (OS and architecture)
Linux
Relevant log output
No response
Accompanying data
No response
Organisation
Met Office
Probably also related to ecmwf/fckit#41
In that issue there's mention of explicit warnings like:
[WARNING] yaksa: 2 leaked handle pool objects
This yaksa is apparently a memory pool used in MPICH.
My hunch is that the eckit approach of calling MPI_Finalize
during the destruction of static objects (after main
) does not play nice with MPICH. @tlmquintino do you have any suggestion?