SymbioticLab / Salus

Fine-grained GPU sharing primitives

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to find the GPU memory usage pattern in TensorFlow or Pytorch?

Xuyuanjia2014 opened this issue · comments

I have read your essay Fine-Grained GPU Sharing Primitives for Deep Learning Applications (MLSys 2020) and other deep learning scheduling including Gandiva (OSDI 2018), Tiresias (NSDI 2019).

Due to TensorFlow or Pytorch cache policy, I use the NVIDIA-SMI command to detect DL's GPU memory usage and it always got 100%.

It there any tools or methods I can use to get the similar characterization in Salus or Gandiva, maybe tensorflow profiler?

commented

Hi,

nvidia-smi indeed doesn't work in this case due to the memory pooling.

Can't say for Gandiva. In Salus, I get the data by modifying Tensorflow/PyTorch's allocator and added customized logging for later post-processing.