nvml library is not getting initialized on ubuntu22.04
sujithapallapothu opened this issue · comments
package main
import (
"fmt"
"log"
"github.com/NVIDIA/go-nvml/pkg/nvml"
)
func main() {
ret := nvml.Init()
if ret != nvml.SUCCESS {
log.Fatalf("Unable to initialize NVML: %v", nvml.ErrorString(ret))
}
defer func() {
ret := nvml.Shutdown()
if ret != nvml.SUCCESS {
log.Fatalf("Unable to shutdown NVML: %v", nvml.ErrorString(ret))
}
}()
count, ret := nvml.DeviceGetCount()
fmt.Println("count",count)
if ret != nvml.SUCCESS {
log.Fatalf("Unable to get device count: %v", nvml.ErrorString(ret))
}
for i := 0; i < count; i++ {
device, ret := nvml.DeviceGetHandleByIndex(i)
if ret != nvml.SUCCESS {
log.Fatalf("Unable to get device at index %d: %v", i, nvml.ErrorString(ret))
}
uuid, ret := device.GetUUID()
if ret != nvml.SUCCESS {
log.Fatalf("Unable to get uuid of device at index %d: %v", i, nvml.ErrorString(ret))
}
fmt.Printf("%v\n", uuid)
processInfos, ret := device.GetComputeRunningProcesses()
if ret != nvml.SUCCESS {
log.Fatalf("Unable to get process info for device at index %d: %v", i, nvml.ErrorString(ret))
}
fmt.Printf("Found %d processes on device %d\n", len(processInfos), i)
for pi, processInfo := range processInfos {
fmt.Printf("\t[%2d] ProcessInfo: %+v\n", pi, processInfo)
}
}
When Im executing above go code, getting below error in my linux device
Error initializing NVML:ERROR_LIBRARY_NOT_FOUND
Can someone please suggest why nvml package is not getting initialized even nvml library is getting imported and do exists in above go file ??
Spec of my linux device follows as:
Ubuntu version: 22.04
Graphical card: 61:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 40GB] (rev a1)
Nvidia Driver version: 550.54.14
CUDA Version: 12.4
Go version: go1.21.9 linux/amd64
Do you have the NVIDIA driver installed? Where is the libnvidia-ml.so.1
library located on your system?
yes @klueska
I have libnvidia-ml.so.1 in my linux device ( ubunut22.04)
root@ubuntu2204:/tmp# locate libnvidia-ml.so.1
/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
@sujithapallapothu the error message: "Error initializing NVML" does not seem to exist in the go-nvml
code base and is also not present in the snippet that you pasted above.
Could you give more information about your environment -- including the output of nvidia-smi
?
The code you show seems to come from one of the examples included in the repository, could you check out the latest version off main
and run make examples
in the root folder. You should be able to run these examples then.
@elezar yes you are right, I have taken code from examples and wrote into my sample.go file which looks like below
if hasNvidiaGPUs() {
err := nvml.Init()
if err != nvml.SUCCESS {
fmt.Println("Error initializing NVML:", err)
//return err
}
defer nvml.Shutdown()
deviceCount, err := nvml.DeviceGetCount()
if err != nvml.SUCCESS {
fmt.Println("Error getting device count:", err)
//log.Fatalf("Unable to get device count: %v", nvml.ErrorString(err))
}
fmt.Println("Number of NVIDIA GPUs:", deviceCount)
} else {
fmt.Println("No Nividia GPUs")
}
where hasNvidiaGPUs() function checks nvidia graphical card exists or not. I built above code using go build -tags netgo -ldflags '-s -extldflags "-static"' sample.go and then excuted go binary which results in Error initializing NVML:ERROR_LIBRARY_NOT_FOUND
more details about my env is as follows
Ubuntu version: 22.04
Graphical card: 61:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 40GB] (rev a1)
Nvidia Driver version: 550.54.14
CUDA Version: 12.4
Go version: go1.21.9 linux/amd64
Please help further on this.
Thankyou
Note that when we build applications on linux that use this library we specify:
-ldflags "-s -w '-extldflags=-Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files'
It could be that the static flag is causign the libnvidia-ml.so.1
library to not be loaded.