nvmlDeviceGetMemoryInfo has weird result
knsong opened this issue · comments
hi,
when I use nvmlDeviceGetMemoryInfo
to get the gpu memory used info, it will return a false number e.g. 16MB while watch -n 0.01 nvidia-smi
would show ~11G used. Here is example code below:
py3nvml.nvmlInit()
device_count = py3nvml.nvmlDeviceGetCount()
assert gpuid < device_count
handle = py3nvml.nvmlDeviceGetHandleByIndex(gpuid)
mem_info = py3nvml.nvmlDeviceGetBAR1MemoryInfo(handle)
if mem_info != 'N/A':
print(mem_info)
used = mem_info.bar1Used >> 20
total = mem_info.bar1Total >> 20
else:
used = 0
total = 0
Hey @knsong, sorry for the slow reply.
So you've mentioned that nvmlDeviceGetMemoryInfo
returns an incorrect result but you use nvmlDeviceGetBAR1MemoryInfo
which is getting BAR1
memory, not the frame buffer memory. man nvidia-smi
gives a nice description of the difference between FB
memory and BAR1
memory.
Could you check that the following give matching results (assuming GPU id 0 is the gpu you are querying)
nvidia-smi -i 0 -q -d MEMORY
(shows frame buffer and bar1 memory)
from py3nvml import py3nvml
py3nvml.nvmlInit()
handle = py3nvml.nvmlDeviceGetHandleByIndex(0)
fb_mem_info = py3nvml.nvmlDeviceGetMemoryInfo(handle)
bar1_mem_info = py3nvml.nvmlDeviceGetBAR1MemoryInfo(handle)
print(fb_mem_info.used >> 20)
print(bar1_mem_info.bar1Used >> 20)
@fbcotter thanks a lot, that solved the problem.