score-p / scorep_binding_python

Allows tracing of python code using Score-P

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

simulation hangs when running with Score-P

Rigel-Alves opened this issue · comments

Hi,

I can run my Python simulation in parallel without problems:

srun -n 24 python run.py

The simulation will output its normal stuff, which includes:

[0]	 redistributing cells ...
[0]	 redistributing cells: 0.0653 [s] (wall clock time)

[0]	 repartitioning mesh by RCB method ...
[0]	 recursive coordinate bisection: 0.109 [s] (wall clock time)

[0]	 redistributing nodes ...
[0]	 redistributing nodes: 0.0929 [s] (wall clock time)

But if I try adding Score-P onto it:

srun -n 24 python -m scorep --mpp=mpi run.py

The code will stay forever at:

[0]	 redistributing cells ...
[0]	 redistributing cells: 0.0697 [s] (wall clock time)

[0]	 repartitioning mesh by RCB method ...

Adding the --nocompiler flag makes no difference. Do you have any ideas about what could be going on / how to bypass this issue?

Thank you very much,

Thanks Ronny, but adding the flag --noinstrumenter made no difference (it was supposed to turn off instrumentation of the entire code)... Is there a flag specific for MPI instrumentation? I cannot turn off MPI instrumentation for the entire code, as I will need to analyse MPI calls later on, during the main solver loop.

I did. The flag --noinstrumenter was supposed to turn off instrumentation of the entire code, so why is it not working?

On the other hand, trying export SCOREP_MPI_ENABLE_GROUPS="" or even:

import scorep

with scorep.instrumenter.disable():
    fsmesh.RepartitionMeshRCB() or FSError.PrintAndExit()

also makes no difference.

I don't know which method Score-P is using to capture MPI events; I also don't know what LD_PRELOAD is. There is no manual instrumentation in the code. All I am doing is:

srun -n 24 python -m scorep --mpp=mpi --noinstrumenter run.py

but the simulation is still hanging at that same place.

Dear Rigel,

the documentation of the Python Bindings states:

--noinstrumenter disables the instrumentation of python code. Useful for user instrumentation and to trace only specific code regions using scorep.instrumenter.enable.

In other words: It does only influence the python instrumentation, not with any other instrumentation (like MPI).

Moreover, I'd like to emphasize the following from the documentation:

The usual Score-P environment Variables will be respected. Please have a look at:
score-p.org
and
Score-P Documentation

There you'll find any details about MPI instrumentation and filtering.

Moreover, I'd like to mention, that the flag --nocompiler does not remove compiler instrumentation. It does only disable the compiler instrumentation subsystem of Score-P. I agree that the documentation is a bit misleading here. I'll fix that.

Best,

Andreas

Thanks Andreas, but as I said above and according to the documentation at http://scorepci.pages.jsc.fz-juelich.de/scorep-pipelines/docs/scorep-6.0/html/measurement.html#mpi_groups, I tried deactivating instrumentation of MPI routines altogether (what is not an option for me, but just for the sake of trying), by means of:

export SCOREP_MPI_ENABLE_GROUPS=""

But it made no difference: the simulation still hangs on the same place.

(1) SCOREP_MPI_ENABLE_GROUPS is all I can find in the documentation about measurement control of MPI routines (see the table at http://scorepci.pages.jsc.fz-juelich.de/scorep-pipelines/docs/scorep-6.0/html/instrumentation.html). If it is unsuitable to disable the MPI instrumentation, what else would be?

(2) Last time I waited, the simulation had more than 30 minutes in a point where it takes a couple of seconds when running without Score-P... With such an overhead, it is irrelevant whether it actually hanged or not: the Python bindings become unusable.

My code is DLR's FSDM and CODA. Nothing in it is built with Score-P. Score-P is applied solely at the Python layer (the top layer).

Andreas, how can I filter the problematic region from the MPI instrumentation?

With such an overhead, it is irrelevant whether it actually hanged or not: the Python bindings become unusable.
My code is DLR's FSDM and CODA. Nothing in it is built with Score-P. Score-P is applied solely at the Python layer (the top layer).

Andreas, how can I filter the problematic region from the MPI instrumentation?

The MPI overhead is not related to the Score-P Python Bindings, rather than to Score-P itself. The Python bindings instrument python code, and use Score-P for everything else. There is no way to "filter" MPI calls other than the ways Score-P provides itself.

However, if you like, you can disable the MPI instrumetnation simply by setting --mpp=none as described in the Score-P documentation or in the Score-P help. If you do so. You need to ensure that each process spawned uses a different experiment directory.

Best,

Andreas

Hey,

have you been able to solve your Issue?

Best,

Andreas

Unfortunately not, Andreas... But luckily DLR's code has other mesh partitioning schemes (like Parmetis), which happen to work with Score-P.

I think this is an acceptable form of solved, even more as this is not a Score-P python Binding Issue, but if at all a Score-P issue.

Best,

Andreas