bsc-pm / dlb

DLB (Dynamic Load Balancing) library is a tool, transparent to the user, that will dynamically react to the application imbalance modifying the number of resources at any given time.

Home Page:https://pm.bsc.es/dlb

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DLB_TALP_Attach() creates the shared-memory segment if it does not exist yet

kingshuk00 opened this issue · comments

Calling DLB_TALP_Attach() from outside calls shmem_cpuinfo_ext__init() and shmem_procinfo_ext__init(). They both call open_shmem() -> shmem_init() -> shm_open()+ ftruncate(). This creates the segment even when it does not exist.
Perhaps something similar to what is done in DLB_DROM_PreInit() (calls shmem_procinfo_ext__preinit()) to check for its existence would be helpful. Or, it can be checked from /dev/shm as well.

I created a commit on my fork (link).
Could you please suggest whether this is not the intended behaviour.

I'm not sure. The thing with DLB_TALP_Attach and other mechanisms for attaching from a 3rd party process is that is asynchronous.

Imagine you initiate first just a monitor program. If there's no other DLB program running, the monitor program will exit with an error because there's no shared memory to attach to.

With the suggested change, one needs to start an application that uses TALP before a third-party program may attach to it. Whereas now, the third-party program may start, and sit idle waiting for TALP processes to start and monitor.

Is calling DLB_TALP_Attach() and creating an empty shared memory causing any problem?

My monitoring code looks like:

DLB_TALP_Attach();
DLB_TALP_GetNumCPUs(&nprocs);
pids= (int *) malloc(sizeof(int)* nprocs);
DLB_TALP_GetPidList(pids, &nelems, nprocs);
while( 0 == nelems ) {
    usleep( 500000 );
    DLB_TALP_GetPidList(pids, &nelems, nprocs);
}

while( !kill(pids[0], 0) ) {
    error = DLB_TALP_GetTimes(pids[0], &mpi_time, &useful_time);
    if (error != DLB_SUCCESS) break;

    printf("%d, mpi time: %g; useful time: %g\n", pids[0], mpi_time, useful_time);
    usleep( 500000 );
}

DLB_TALP_Detach();
if( NULL != pids ) {
    free(pids);
    pids = NULL;
}

If I run the following command:

$ export DLB_ARGS="--talp --talp-external-profiler --verbose=shmem"
$ mpirun -np 3 env LD_PRELOAD=<dlb-install-dir>/lib/libdlb_mpi.so ./executable executable-options
  • When DLB_TALP_Attach() is not called from an external monitor program (monitoring code is not running):
DLB SHMEM[laptop:"pid-0"]: Shared Memory Init: pid("pid-0"), module(procinfo)
DLB SHMEM[laptop:"pid-0"]: Initializing Shared Memory (procinfo)
DLB SHMEM[laptop:"pid-0"]: Checking shared memory consistency (procinfo)
DLB SHMEM[laptop:"pid-0"]: Shared Memory Init: pid("pid-0"), module(barrier)
DLB SHMEM[laptop:"pid-0"]: Initializing Shared Memory (barrier)
DLB SHMEM[laptop:"pid-0"]: Checking shared memory consistency (barrier)
DLB SHMEM[laptop:"pid-0"]: Shared Memory Init: pid("pid-0"), module(talp)
DLB SHMEM[laptop:"pid-0"]: Initializing Shared Memory (talp)
DLB SHMEM[laptop:"pid-0"]: Checking shared memory consistency (talp)
DLB[laptop:"pid-0"]: dlb 3.4a
DLB SHMEM[laptop:"pid-0"]: Enabled verbose mode for Shared Memory
Prog: built on Aug 29 2023 15:20:48
Rank-0: pid= "pid-0"
DLB SHMEM[laptop:"pid-1"]: Shared Memory Init: pid("pid-1"), module(procinfo)
DLB SHMEM[laptop:"pid-1"]: Attached to Shared Memory (procinfo)
DLB SHMEM[laptop:"pid-1"]: Checking shared memory consistency (procinfo)
DLB SHMEM[laptop:"pid-1"]: Shared Memory Init: pid("pid-1"), module(barrier)
DLB SHMEM[laptop:"pid-1"]: Attached to Shared Memory (barrier)
DLB SHMEM[laptop:"pid-1"]: Checking shared memory consistency (barrier)
DLB SHMEM[laptop:"pid-1"]: Shared Memory Init: pid("pid-1"), module(talp)
DLB SHMEM[laptop:"pid-1"]: Attached to Shared Memory (talp)
DLB SHMEM[laptop:"pid-1"]: Checking shared memory consistency (talp)
DLB SHMEM[laptop:"pid-1"]: Enabled verbose mode for Shared Memory
DLB SHMEM[laptop:"pid-2"]: Shared Memory Init: pid("pid-2"), module(procinfo)
DLB SHMEM[laptop:"pid-2"]: Attached to Shared Memory (procinfo)
DLB SHMEM[laptop:"pid-2"]: Checking shared memory consistency (procinfo)
DLB SHMEM[laptop:"pid-2"]: Shared Memory Init: pid("pid-2"), module(barrier)
DLB SHMEM[laptop:"pid-2"]: Attached to Shared Memory (barrier)
DLB SHMEM[laptop:"pid-2"]: Checking shared memory consistency (barrier)
DLB SHMEM[laptop:"pid-2"]: Shared Memory Init: pid("pid-2"), module(talp)
DLB SHMEM[laptop:"pid-2"]: Attached to Shared Memory (talp)
DLB SHMEM[laptop:"pid-2"]: Checking shared memory consistency (talp)
DLB SHMEM[laptop:"pid-2"]: Enabled verbose mode for Shared Memory

Running the monitoring code after this make it fetch pids successfully.

  • However, if DLB_TALP_Attach() is already called from an external monitor program before running the application:
DLB SHMEM[laptop:"pid-1"]: Shared Memory Init: pid("pid-1"), module(procinfo)
DLB SHMEM[laptop:"pid-1"]: Attached to Shared Memory (procinfo)
DLB SHMEM[laptop:"pid-1"]: Checking shared memory consistency (procinfo)
DLB WARNING[laptop:"pid-1"]: DLB could not initialize the shared memory due to incompatible options among processes, likely ones sharing CPUs and others not. Please, if you believe this is a bug contact us at pm-tools@bsc.es
DLB SHMEM[laptop:"pid-1"]: Shared Memory Init: pid("pid-1"), module(barrier)
DLB SHMEM[laptop:"pid-1"]: Initializing Shared Memory (barrier)
DLB SHMEM[laptop:"pid-1"]: Checking shared memory consistency (barrier)
DLB SHMEM[laptop:"pid-1"]: Shared Memory Init: pid("pid-1"), module(talp)
DLB SHMEM[laptop:"pid-1"]: Attached to Shared Memory (talp)
DLB SHMEM[laptop:"pid-1"]: Checking shared memory consistency (talp)
DLB SHMEM[laptop:"pid-1"]: Enabled verbose mode for Shared Memory
DLB SHMEM[laptop:"pid-2"]: Shared Memory Init: pid("pid-2"), module(procinfo)
DLB SHMEM[laptop:"pid-2"]: Attached to Shared Memory (procinfo)
DLB SHMEM[laptop:"pid-2"]: Checking shared memory consistency (procinfo)
DLB WARNING[laptop:"pid-2"]: DLB could not initialize the shared memory due to incompatible options among processes, likely ones sharing CPUs and others not. Please, if you believe this is a bug contact us at pm-tools@bsc.es
DLB SHMEM[laptop:"pid-2"]: Shared Memory Init: pid("pid-2"), module(barrier)
DLB SHMEM[laptop:"pid-2"]: Attached to Shared Memory (barrier)
DLB SHMEM[laptop:"pid-2"]: Checking shared memory consistency (barrier)
DLB SHMEM[laptop:"pid-2"]: Shared Memory Init: pid("pid-2"), module(talp)
DLB SHMEM[laptop:"pid-2"]: Attached to Shared Memory (talp)
DLB SHMEM[laptop:"pid-2"]: Checking shared memory consistency (talp)
DLB SHMEM[laptop:"pid-2"]: Enabled verbose mode for Shared Memory
DLB SHMEM[laptop:"pid-0"]: Shared Memory Init: pid("pid-0"), module(procinfo)
DLB SHMEM[laptop:"pid-0"]: Attached to Shared Memory (procinfo)
DLB SHMEM[laptop:"pid-0"]: Checking shared memory consistency (procinfo)
DLB WARNING[laptop:"pid-0"]: DLB could not initialize the shared memory due to incompatible options among processes, likely ones sharing CPUs and others not. Please, if you believe this is a bug contact us at pm-tools@bsc.es
DLB SHMEM[laptop:"pid-0"]: Shared Memory Init: pid("pid-0"), module(barrier)
DLB SHMEM[laptop:"pid-0"]: Attached to Shared Memory (barrier)
DLB SHMEM[laptop:"pid-0"]: Checking shared memory consistency (barrier)
DLB SHMEM[laptop:"pid-0"]: Shared Memory Init: pid("pid-0"), module(talp)
DLB SHMEM[laptop:"pid-0"]: Attached to Shared Memory (talp)
DLB SHMEM[laptop:"pid-0"]: Checking shared memory consistency (talp)
DLB[laptop:"pid-0"]: dlb 3.4a
DLB SHMEM[laptop:"pid-0"]: Enabled verbose mode for Shared Memory

The monitoring code loops forever to get the pids even after the program has started.

By the way, thank you for exaplaining the intended behaviour. This is helpful and I understand now that ideally creating a shared-memory is not a problem as long as it works as intended.

Oh, I see. There's a bug when DLB_TALP_Attach creates the shared memory and other processes expect a certain value which is not set. DLB warns about it and I think TALP is never enabled in this shared memory, that's why the external process doesn't see any other process:

DLB could not initialize the shared memory due to incompatible options among processes, likely ones sharing CPUs and others not. Please, if you believe this is a bug contact us at pm-tools@bsc.es

If you need it to work right now, I can think of a workaround:

diff --git a/src/LB_comm/shmem_procinfo.c b/src/LB_comm/shmem_procinfo.c
index 04ab8e4..7bb9a59 100644
--- a/src/LB_comm/shmem_procinfo.c
+++ b/src/LB_comm/shmem_procinfo.c
@@ -244,7 +244,8 @@ static int shmem_procinfo__init_(pid_t pid, pid_t preinit_pid, const cpu_set_t *
             if (shdata->allow_cpu_sharing != allow_cpu_sharing) {
                 // For now we require all processes registering the procinfo
                 // to have the same value in 'allow_cpu_sharing'
-                error = DLB_ERR_NOCOMP;
+                // error = DLB_ERR_NOCOMP;
+                shdata->allow_cpu_sharing = allow_cpu_sharing;
             }
         }

In any case, in the following days I will try to upload a proper fix. Thanks.

Thanks Victor for the intermediate fix. I can confirm that this works.
There is a need to explicitly create a DLB-monitoring region with a very specific name ("MPI Region", like in the following line) for the external monitoring program to fetch meaningful MPI and useful time.
dlb_monitor_t *mon= DLB_MonitoringRegionRegister("MPI Region");
I figured this by examining DLB_TALP_Attach().
Otherwise, calling DLB_TALP_Attach() registers a region called "MPI Region" in talp, but not as a monitor and hence not updated from talp_[into/out_of]_sync_call() (nregions is not updated in DLB_talp.c).
Would it be possible for you to suggest whether this is a related issue? Otherwise, I shall create another issue.

Right, I've done some tests with an external profiler doing DLB_TALP_Attach() and obtaining metrics from the region is not working as it should.

Thanks for pointing it out, I will do a fix for all these things in this issue, no need for creating another for now.

I think it should be fixed, but let us know if you find anything. You can also undo the workaround in LB_comm/shmem_procinfo.c if you update your main branch.

We've also implemented a function to do DLB_TALP_GetPidList + DLB_TALP_GetTimes at once, should you find it useful. A small pseudo-code example of a profiler would be:

Using DLB_TALP_GetPidList + DLB_TALP_GetTimes :

DLB_TALP_Attach();
while(...) {
    int pidlist[MAX_PROCS], nelems;
    DLB_TALP_GetPidList(pidlist, &nelems, MAX_PROCS);
    for(n in nelems) {
        double mpi_time, useful_time;
        if (DLB_TALP_GetTimes(pid, &mpi_time, &useful_time) == DLB_SUCCESS) {
            printf("Found pid: %d, mpi_time: %f s, useful_time: %f s\n",
                    pid, mpi_time, useful_time);
        }
    }
}
DLB_TALP_Detach();

Using DLB_TALP_GetNodeTimes :

DLB_TALP_Attach();
while(...) {
    dlb_node_times_t node_times[MAX_PROCS];
    DLB_TALP_GetNodeTimes(DLB_MPI_REGION, node_times, &nelems, MAX_PROCS);
    for(n in nelems) {
        printf("Found pid: %d, mpi_time: %"PRId64" ns, useful_time: %"PRId64" ns\n",
                node_times[n].pid,
                node_times[n].mpi_time,
                node_times[n].useful_time);
    }
}
DLB_TALP_Detach();

You could also call DLB_TALP_QueryPOPNodeMetrics to obtain synthesized node metrics. Also, let us know if these features cover your use case. Thanks.