bsc-pm / nanox

Nanos++ is a runtime designed to serve as runtime support in parallel environments. It is mainly used to support OmpSs, a extension to OpenMP developed at BSC.

Home Page:https://pm.bsc.es/nanox

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feature Request: Add support for Valgrind

devreal opened this issue · comments

Valgrind offers an API [1] that allows marking memory regions as stacks. This helps to reduce the number of false positives when using Valgrind in combination with OmpSs.

Such an instrumentation could be added to a debug build if valgrind has been detected during configure. The change should be minimal (3 lines in smpdd.cpp plus integration into the build system).

[1] http://valgrind.org/docs/manual/manual-core-adv.html

That would be really nice. Passing ompss programs through valgrind is always a very painful process.

While I wouldn't mind to add such support, the Valgrind documentation states this warning about all the three client requests related to stacks:

Warning: Unfortunately, this client request is unreliable and best avoided.

Which makes me think whether this effort is worthwhile.

Have you considered to disable user-level threads when running Valgrind with OmpSs?
(export NX_ARGS+=' --disable-ut').

@vlopezh I didn't see these warnings. I have incorporated the Valgrind macros into my own UT runtime and hacked it into OmpSs (because the false positives made using Valgrind impossible).

Have you considered to disable user-level threads when running Valgrind with OmpSs?
(export NX_ARGS+=' --disable-ut').

How does that change the behavior of the runtime? Does that mean task execution is serialized? I cannot find documentation for it right of the hand...

Disabling User Threads (or ULT) does not mean to serialize task execution, but to avoid task migration among threads. It directly means that once a task is started in one thread, it will be tied to this thread and, after suspended (e.g., taskwait) it could not be resumed by another thread.

It corresponds, more or less, to the implementation of a tied task in OpenMP (which, by the way, is the default behaviour in this model).

I have incorporated the Valgrind macros into my own UT runtime and hacked it into OmpSs (because the false positives made using Valgrind impossible).

Could you circulate your implementation to have a clearer idea of the required changes and the parameters used.

It's by far not complete (misses deregistration) but it did the trick for me:

--- ompss-19.06/nanox-0.15/src/arch/smp/smpdd.cpp	2019-06-27 19:00:34.000000000 +0200
+++ ompss-19.06-valgrind/nanox-0.15/src/arch/smp/smpdd.cpp	2020-02-03 13:57:23.675151305 +0100
@@ -25,6 +25,7 @@
 #include "taskexecutionexception.hpp"
 #include "smpdevice.hpp"
 #include "schedule.hpp"
+#include <valgrind/valgrind.h>
 #include <string>
 
 using namespace nanos;
@@ -88,6 +89,7 @@
    if (isUserLevelThread) {
       if (previous == NULL) {
          _stack = (void *) NEW char[_stackSize];
+         VALGRIND_STACK_REGISTER(_stack, (void*)(((uintptr_t)_stack)+_stackSize));
          verbose0("   new stack created: " << _stackSize << " bytes");
       } else {
          verbose0("   reusing stacks");

I didn't bother wrapping the valgrind calls in some nanox macros to disable them if needed. It could be sufficient to add another configure flag --enable-valgrind that needs to be explicitly added.

Added feature in e4b5dbe.
Thank you.