fbcotter / py3nvml

Python 3 Bindings for NVML library. Get NVIDIA GPU status inside your program.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nvmlDeviceGetMemoryInfo has weird result

knsong opened this issue · comments

hi,
when I use nvmlDeviceGetMemoryInfo to get the gpu memory used info, it will return a false number e.g. 16MB while watch -n 0.01 nvidia-smi would show ~11G used. Here is example code below:

py3nvml.nvmlInit()
device_count = py3nvml.nvmlDeviceGetCount()
assert gpuid < device_count
handle = py3nvml.nvmlDeviceGetHandleByIndex(gpuid)
mem_info = py3nvml.nvmlDeviceGetBAR1MemoryInfo(handle)
if mem_info != 'N/A':
    print(mem_info)
    used = mem_info.bar1Used >> 20
    total = mem_info.bar1Total >> 20
else:
    used = 0
    total = 0

Hey @knsong, sorry for the slow reply.

So you've mentioned that nvmlDeviceGetMemoryInfo returns an incorrect result but you use nvmlDeviceGetBAR1MemoryInfo which is getting BAR1 memory, not the frame buffer memory. man nvidia-smi gives a nice description of the difference between FB memory and BAR1 memory.

Could you check that the following give matching results (assuming GPU id 0 is the gpu you are querying)

  1. nvidia-smi -i 0 -q -d MEMORY (shows frame buffer and bar1 memory)
from py3nvml import py3nvml
py3nvml.nvmlInit()
handle = py3nvml.nvmlDeviceGetHandleByIndex(0)
fb_mem_info = py3nvml.nvmlDeviceGetMemoryInfo(handle)
bar1_mem_info = py3nvml.nvmlDeviceGetBAR1MemoryInfo(handle)
print(fb_mem_info.used >> 20)
print(bar1_mem_info.bar1Used >> 20)

@fbcotter thanks a lot, that solved the problem.