An error related to the MPI communicators

Question

An error related to the MPI communicators

maniset opened this issue 3 years ago · comments

Hello,

Thanks for this great library.

I have a question. I have created a solver with shenfun in a function which is used in another code in an iterative process. However, it only works with a limited number of iterations, and if the number of iterations goes up, the shenfun gives the following error.

mpi4py/MPI/Comm.pyx in mpi4py.MPI.Cartcomm.Sub()

Exception: Other MPI error, error stack:
PMPI_Cart_sub(213)..................: MPI_Cart_sub(comm=0x84000000, remain_dims=0x7f1aedab1320, comm_new=0x7f1aed8c7da0) failed
PMPI_Cart_sub(152)..................: 
MPIR_Comm_split_impl(253)...........: 
MPIR_Get_contextid_sparse_group(602): Too many communicators (0/2048 free on this process; ignore_id=0)

I think I should do something like unload the MPI or free the MPI communicators after each iteration (each time solver is used). Is there any way to solve this problem?

Sincerely

Mikael Mortensen · Answer 1 · Mon Jan 11 2021 06:13:27 GMT+0800 (China Standard Time)

Hi,

You are right. If you create a TensorProductSpace in an iterative process, then you need to be careful with garbage control. This is probably not well documented, but the class has a destroy method, that you can call at the end of the iteration, and that should take care of cleaning up. This method is part of mpi4py-fft, see here. It is used for example in the tests here.

maniset · Answer 2 · Mon Jan 11 2021 14:49:25 GMT+0800 (China Standard Time)

Thanks a lot. That works.