facebookincubator / dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

DCGM add pause and resume functionality through dyno cli

haowangludx opened this issue · comments

Dynolog DCGM module conflicts with other Nvidia profiling tools/libraries (i.e. nsight compute, CUPTI), we want to add a pause and resume functionality of the profiling module so users can disable it when we need to run other Nvidia toolings.

@anupambhatnagar