utkuozdemir / nvidia_gpu_exporter

Nvidia GPU exporter for prometheus using nvidia-smi binary

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Run nvidia-gpu-exporter in k8s an error

hrwang123 opened this issue · comments

1、use nvidia-gpu-exporter 0.3.0 images

this is error:
in nvidia-gpu-exporter container input nvidia-smi then output error:
/tmp/cuda-control/src/register.c: 66 can't register to manager, error No such file or directory
/tmp/cuda-control/src/register.c: 87 rpc client exit with 255

Please try the latest release and make sure you follow the steps described here: https://github.com/utkuozdemir/nvidia_gpu_exporter/blob/master/INSTALL.md#running-in-kubernetes

I'll close this issue since it is more about getting nvidia-smi to work in a k8s container - it is out of scope for the exporter, as every k8s setup/linux distro could potentially require different configurations.

My suggestion is to try to get nvidia-smi --query-gpu ... working in a pod in your k8s cluster, and when you get it working, looking into adapting it by taking the official helm chart as reference.

If you have any further findings, please feel free to share.