simulation hangs when running with Score-P

Question

simulation hangs when running with Score-P

Rigel-Alves opened this issue 4 years ago · comments

Rigel-Alves commented 4 years ago

Hi,

I can run my Python simulation in parallel without problems:

srun -n 24 python run.py

The simulation will output its normal stuff, which includes:

[0]	 redistributing cells ...
[0]	 redistributing cells: 0.0653 [s] (wall clock time)

[0]	 repartitioning mesh by RCB method ...
[0]	 recursive coordinate bisection: 0.109 [s] (wall clock time)

[0]	 redistributing nodes ...
[0]	 redistributing nodes: 0.0929 [s] (wall clock time)

But if I try adding Score-P onto it:

srun -n 24 python -m scorep --mpp=mpi run.py

The code will stay forever at:

[0]	 redistributing cells ...
[0]	 redistributing cells: 0.0697 [s] (wall clock time)

[0]	 repartitioning mesh by RCB method ...

Adding the --nocompiler flag makes no difference. Do you have any ideas about what could be going on / how to bypass this issue?

Thank you very much,

rtschueter · Answer 1 · Tue Nov 10 2020 04:34:41 GMT+0800 (China Standard Time)

Hi, RCB is known to be MPI intense which causes massive performance degradation if the MPI functions are instrumented. Hence, you might except RCB from MPI instrumentation. Regards Ronny

…

On 09.11.20 20:53, Rigel-Alves wrote: Hi, I can run my Python simulation in parallel without problems: `srun -n 24 python run.py` The simulation will output its normal stuff, which includes: ``` [0] redistributing cells ... [0] redistributing cells: 0.0653 [s] (wall clock time) [0] repartitioning mesh by RCB method ... [0] recursive coordinate bisection: 0.109 [s] (wall clock time) [0] redistributing nodes ... [0] redistributing nodes: 0.0929 [s] (wall clock time) ``` But if I try adding Score-P onto it: `srun -n 24 python -m scorep --mpp=mpi run.py` The code will stay forever at: ``` [0] redistributing cells ... [0] redistributing cells: 0.0697 [s] (wall clock time) [0] repartitioning mesh by RCB method ... ``` Adding the `--nocompiler` flag makes no difference. Do you have any ideas about what could be going on / how to bypass this issue? Thank you very much,

Rigel-Alves · Answer 2 · Tue Nov 10 2020 22:17:40 GMT+0800 (China Standard Time)

Thanks Ronny, but adding the flag --noinstrumenter made no difference (it was supposed to turn off instrumentation of the entire code)... Is there a flag specific for MPI instrumentation? I cannot turn off MPI instrumentation for the entire code, as I will need to analyse MPI calls later on, during the main solver loop.

rtschueter · Answer 3 · Tue Nov 10 2020 22:39:36 GMT+0800 (China Standard Time)

Please check the documentation Am 10.11.2020 um 15:17 schrieb Rigel-Alves <notifications@github.com>: Thanks Ronny, but adding the flag --noinstrumenter made no difference (it was supposed to turn off instrumentation of the entire code)... Is there a flag specific for MPI instrumentation? I cannot turn off MPI instrumentation for the entire code, as I will need to analyse MPI calls later on, during the main solver loop. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#114 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ADCPOM3QQW4XG4YXBNBBTD3SPFDRLANCNFSM4TPZBMWA>.

Rigel-Alves · Answer 4 · Tue Nov 10 2020 22:52:23 GMT+0800 (China Standard Time)

I did. The flag --noinstrumenter was supposed to turn off instrumentation of the entire code, so why is it not working?

On the other hand, trying export SCOREP_MPI_ENABLE_GROUPS="" or even:

import scorep

with scorep.instrumenter.disable():
    fsmesh.RepartitionMeshRCB() or FSError.PrintAndExit()

also makes no difference.

rtschueter · Answer 5 · Tue Nov 10 2020 23:02:45 GMT+0800 (China Standard Time)

What method do you use in order to capture MPI events? Are you instrumenting your code or do you use LD_PRELOAD? You know your experiment setup best. Setting SCOREP_MPI_ENABLE_GROUPS affects recording of events. This could help w.r.t. size of your race file but might fail in reducing the runtime overhead.

…

On 10.11.20 15:52, Rigel-Alves wrote: I did. The flag `--noinstrumenter` was supposed to turn off instrumentation of the entire code, so why is it not working? On the other hand, trying `export SCOREP_MPI_ENABLE_GROUPS=""` or even: ``` import scorep with scorep.instrumenter.disable(): fsmesh.RepartitionMeshRCB() or FSError.PrintAndExit() ``` also makes no difference.

Rigel-Alves · Answer 6 · Tue Nov 10 2020 23:11:30 GMT+0800 (China Standard Time)

I don't know which method Score-P is using to capture MPI events; I also don't know what LD_PRELOAD is. There is no manual instrumentation in the code. All I am doing is:

srun -n 24 python -m scorep --mpp=mpi --noinstrumenter run.py

but the simulation is still hanging at that same place.

Andreas Gocht-Zech · Answer 7 · Tue Nov 10 2020 23:48:52 GMT+0800 (China Standard Time)

Dear Rigel,

the documentation of the Python Bindings states:

--noinstrumenter disables the instrumentation of python code. Useful for user instrumentation and to trace only specific code regions using scorep.instrumenter.enable.

In other words: It does only influence the python instrumentation, not with any other instrumentation (like MPI).

Moreover, I'd like to emphasize the following from the documentation:

The usual Score-P environment Variables will be respected. Please have a look at:
score-p.org
and
Score-P Documentation

There you'll find any details about MPI instrumentation and filtering.

Moreover, I'd like to mention, that the flag --nocompiler does not remove compiler instrumentation. It does only disable the compiler instrumentation subsystem of Score-P. I agree that the documentation is a bit misleading here. I'll fix that.

Best,

Andreas

Rigel-Alves · Answer 8 · Wed Nov 11 2020 01:30:04 GMT+0800 (China Standard Time)

Thanks Andreas, but as I said above and according to the documentation at http://scorepci.pages.jsc.fz-juelich.de/scorep-pipelines/docs/scorep-6.0/html/measurement.html#mpi_groups, I tried deactivating instrumentation of MPI routines altogether (what is not an option for me, but just for the sake of trying), by means of:

export SCOREP_MPI_ENABLE_GROUPS=""

But it made no difference: the simulation still hangs on the same place.

rtschueter · Answer 9 · Wed Nov 11 2020 06:03:52 GMT+0800 (China Standard Time)

Let me say it again, this time in other words, so that it might be more clear this time: (1) `export SCOREP_MPI_ENABLE_GROUPS=""` does NOT disable MPI instrumentation, it disable the recording of captured MPI events (2) you say that the application hangs, maybe it does NOT, as already said RCB is MPI intense and it is known that executing RCB with a performance monitor attached results in massive prolongation of its runtime We don't know your example code, so you have to find a way to compile RCB without MPI instrumentation.

…

On 10.11.20 18:30, Rigel-Alves wrote: Thanks Andreas, but as I said above and according to the documentation at http://scorepci.pages.jsc.fz-juelich.de/scorep-pipelines/docs/scorep-6.0/html/measurement.html#mpi_groups, I tried deactivating instrumentation of MPI routines altogether (what is not an option for me, but just for the sake of trying), by means of: `export SCOREP_MPI_ENABLE_GROUPS=""` But it made no difference: the simulation still hangs on the same place.

Rigel-Alves · Answer 10 · Wed Nov 11 2020 19:22:23 GMT+0800 (China Standard Time)

(1) SCOREP_MPI_ENABLE_GROUPS is all I can find in the documentation about measurement control of MPI routines (see the table at http://scorepci.pages.jsc.fz-juelich.de/scorep-pipelines/docs/scorep-6.0/html/instrumentation.html). If it is unsuitable to disable the MPI instrumentation, what else would be?

(2) Last time I waited, the simulation had more than 30 minutes in a point where it takes a couple of seconds when running without Score-P... With such an overhead, it is irrelevant whether it actually hanged or not: the Python bindings become unusable.

My code is DLR's FSDM and CODA. Nothing in it is built with Score-P. Score-P is applied solely at the Python layer (the top layer).

Andreas, how can I filter the problematic region from the MPI instrumentation?

Andreas Gocht-Zech · Answer 11 · Wed Nov 11 2020 19:56:54 GMT+0800 (China Standard Time)

With such an overhead, it is irrelevant whether it actually hanged or not: the Python bindings become unusable.
My code is DLR's FSDM and CODA. Nothing in it is built with Score-P. Score-P is applied solely at the Python layer (the top layer).

Andreas, how can I filter the problematic region from the MPI instrumentation?

The MPI overhead is not related to the Score-P Python Bindings, rather than to Score-P itself. The Python bindings instrument python code, and use Score-P for everything else. There is no way to "filter" MPI calls other than the ways Score-P provides itself.

However, if you like, you can disable the MPI instrumetnation simply by setting --mpp=none as described in the Score-P documentation or in the Score-P help. If you do so. You need to ensure that each process spawned uses a different experiment directory.

Best,

Andreas

Andreas Gocht-Zech · Answer 12 · Sat Nov 28 2020 00:39:36 GMT+0800 (China Standard Time)

Hey,

have you been able to solve your Issue?

Best,

Andreas

Rigel-Alves · Answer 13 · Sat Nov 28 2020 21:50:58 GMT+0800 (China Standard Time)

Unfortunately not, Andreas... But luckily DLR's code has other mesh partitioning schemes (like Parmetis), which happen to work with Score-P.

Andreas Gocht-Zech · Answer 14 · Mon Nov 30 2020 18:07:26 GMT+0800 (China Standard Time)

I think this is an acceptable form of solved, even more as this is not a Score-P python Binding Issue, but if at all a Score-P issue.

Best,

Andreas