NVIDIA / go-nvml

Go Bindings for the NVIDIA Management Library (NVML)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

nvml library is not getting initialized on ubuntu22.04

sujithapallapothu opened this issue · comments

package main

import (
        "fmt"
        "log"

        "github.com/NVIDIA/go-nvml/pkg/nvml"
)

func main() {
        ret := nvml.Init()
        if ret != nvml.SUCCESS {
                log.Fatalf("Unable to initialize NVML: %v", nvml.ErrorString(ret))
        }
        defer func() {
                ret := nvml.Shutdown()
                if ret != nvml.SUCCESS {
                        log.Fatalf("Unable to shutdown NVML: %v", nvml.ErrorString(ret))
                }
        }()

        count, ret := nvml.DeviceGetCount()
        fmt.Println("count",count)
        if ret != nvml.SUCCESS {
                log.Fatalf("Unable to get device count: %v", nvml.ErrorString(ret))
        }

        for i := 0; i < count; i++ {
                device, ret := nvml.DeviceGetHandleByIndex(i)
                if ret != nvml.SUCCESS {
                        log.Fatalf("Unable to get device at index %d: %v", i, nvml.ErrorString(ret))
                }

                uuid, ret := device.GetUUID()
                if ret != nvml.SUCCESS {
                        log.Fatalf("Unable to get uuid of device at index %d: %v", i, nvml.ErrorString(ret))
                }


                fmt.Printf("%v\n", uuid)

                processInfos, ret := device.GetComputeRunningProcesses()
                if ret != nvml.SUCCESS {
                        log.Fatalf("Unable to get process info for device at index %d: %v", i, nvml.ErrorString(ret))
                }
                fmt.Printf("Found %d processes on device %d\n", len(processInfos), i)
                for pi, processInfo := range processInfos {
                        fmt.Printf("\t[%2d] ProcessInfo: %+v\n", pi, processInfo)
                }



        }

When Im executing above go code, getting below error in my linux device

Error initializing NVML:ERROR_LIBRARY_NOT_FOUND

Can someone please suggest why nvml package is not getting initialized even nvml library is getting imported and do exists in above go file ??

Spec of my linux device follows as:

Ubuntu version: 22.04
Graphical card: 61:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 40GB] (rev a1)
Nvidia Driver version: 550.54.14
CUDA Version: 12.4
Go version: go1.21.9 linux/amd64

Do you have the NVIDIA driver installed? Where is the libnvidia-ml.so.1 library located on your system?

yes @klueska

I have libnvidia-ml.so.1 in my linux device ( ubunut22.04)

root@ubuntu2204:/tmp# locate libnvidia-ml.so.1

/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1

@sujithapallapothu the error message: "Error initializing NVML" does not seem to exist in the go-nvml code base and is also not present in the snippet that you pasted above.

Could you give more information about your environment -- including the output of nvidia-smi?

The code you show seems to come from one of the examples included in the repository, could you check out the latest version off main and run make examples in the root folder. You should be able to run these examples then.

@elezar yes you are right, I have taken code from examples and wrote into my sample.go file which looks like below

	if hasNvidiaGPUs() {
		err := nvml.Init()
		if err != nvml.SUCCESS {
			fmt.Println("Error initializing NVML:", err)
			//return err

		}
		defer nvml.Shutdown()

		deviceCount, err := nvml.DeviceGetCount()
		if err != nvml.SUCCESS {
			fmt.Println("Error getting device count:", err)
			//log.Fatalf("Unable to get device count: %v", nvml.ErrorString(err))
		}
		fmt.Println("Number of NVIDIA GPUs:", deviceCount)
	} else {
		fmt.Println("No Nividia GPUs")
	}

where hasNvidiaGPUs() function checks nvidia graphical card exists or not. I built above code using go build  -tags netgo -ldflags '-s -extldflags "-static"' sample.go and then excuted go binary which results in Error initializing NVML:ERROR_LIBRARY_NOT_FOUND

image

more details about my env is as follows

Ubuntu version: 22.04
Graphical card: 61:00.0 3D controller: NVIDIA Corporation GA100 [A100 PCIe 40GB] (rev a1)
Nvidia Driver version: 550.54.14
CUDA Version: 12.4
Go version: go1.21.9 linux/amd64

Please help further on this.

Thankyou

image

Im getting above error which is in go-nvml code, seems like library loading is failing. Do i need to set any go env flags while building go binary ??

Please suggest @klueska @elezar

Note that when we build applications on linux that use this library we specify:

-ldflags "-s -w '-extldflags=-Wl,--export-dynamic -Wl,--unresolved-symbols=ignore-in-object-files'

It could be that the static flag is causign the libnvidia-ml.so.1 library to not be loaded.