ecmwf / eckit

A C++ toolkit that supports development of tools and applications at ECMWF.

Home Page:https://confluence.ecmwf.int/display/eckit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

MPI communicator split failures

DJDavies2 opened this issue · comments

What happened?

I am getting failures of this type:

Completed case 0: Test MPI Communicator Split
0 tests failed out of 1.
Completed case 0: Test MPI Communicator Split
0 tests failed out of 1.
Completed case 0: Test MPI Communicator Split
0 tests failed out of 1.
Completed case 0: Test MPI Communicator Split
0 tests failed out of 1.

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 16084 RUNNING AT expspicesrv053
=   EXIT CODE: 9
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Killed (signal 9)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Tests that produce this error are e.g. eckit_test_mpi_splitcomm, eckit_test_mpi_group or eckit_test_mpi_internal_access.

What are the steps to reproduce the bug?

Build and run ctests. It seems that the problems occur with mpich but not with openmpi.

Version

develop

Platform (OS and architecture)

Linux

Relevant log output

No response

Accompanying data

No response

Organisation

Met Office

Probably also related to ecmwf/fckit#41
In that issue there's mention of explicit warnings like:

[WARNING] yaksa: 2 leaked handle pool objects

This yaksa is apparently a memory pool used in MPICH.
My hunch is that the eckit approach of calling MPI_Finalize during the destruction of static objects (after main) does not play nice with MPICH. @tlmquintino do you have any suggestion?