NVIDIA / go-nvml

Go Bindings for the NVIDIA Management Library (NVML)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

can't get `SampleValue.uiVal`

qisikai opened this issue · comments

Hello, I am back again.

I want to call this func:

func DeviceGetSamples(Device Device, _type SamplingType, LastSeenTimeStamp uint64) (ValueType, []Sample, Return) {
	var SampleValType ValueType
	var SampleCount uint32
	ret := nvmlDeviceGetSamples(Device, _type, LastSeenTimeStamp, &SampleValType, &SampleCount, nil)
	if ret != SUCCESS {
		return SampleValType, nil, ret
	}
	if SampleCount == 0 {
		return SampleValType, []Sample{}, ret
	}
	Samples := make([]Sample, SampleCount)
	ret = nvmlDeviceGetSamples(Device, _type, LastSeenTimeStamp, &SampleValType, &SampleCount, &Samples[0])
	return SampleValType, Samples, ret
}

func (Device Device) GetSamples(_type SamplingType, LastSeenTimeStamp uint64) (ValueType, []Sample, Return) {
	return DeviceGetSamples(Device, _type, LastSeenTimeStamp)
}

I want to get values in Samples, which is defined as:

typedef union nvmlValue_st
{
    double dVal;                    //!< If the value is double
    unsigned int uiVal;             //!< If the value is unsigned int
    unsigned long ulVal;            //!< If the value is unsigned long
    unsigned long long ullVal;      //!< If the value is unsigned long long
    signed long long sllVal;        //!< If the value is signed long long
}nvmlValue_t;

I tried uiVal, UiVal, UIvVal but none of them works.

Hi @qisikai ,

A Sample is defined as here:
https://github.com/NVIDIA/go-nvml/blob/master/pkg/nvml/types_gen.go#L92

Where the value embedded inside of it is of type SampleValue [8]byte.

It is defined as such, because golang doesn't support unions.

To cast this byte array to the correct type, you will need to use the ValueType returned by the GetSamples() call, as defined here: https://github.com/NVIDIA/go-nvml/blob/master/pkg/nvml/const.go#L624

From this you can construct something like:

valueType, samples, ret := device.GetSamples(...)
for _, sample := range samples {
    switch valueType {
    case nvml.VALUE_TYPE_DOUBLE:
        var dVal float64 = *(*float64)(unsafe.Pointer(&sample.SampleValue[0]))
        ...
    case nvml.VALUE_TYPE_UNSIGNED_INT:
        var uiVal uint32 = *(*uint32)(unsafe.Pointer(&sample.SampleValue[0]))
        ...
    case nvml. VALUE_TYPE_UNSIGNED_LONG:
        var ulVal uint64 = *(*uint64)(unsafe.Pointer(&sample.SampleValue[0]))
        ...
    case nvml. VALUE_TYPE_UNSIGNED_LONG_LONG:
        var ullVal uint64 = *(*uint64)(unsafe.Pointer(&sample.SampleValue[0]))
        ...
    case nvml. VALUE_TYPE_SIGNED_LONG_LONG:
        var sllVal int64 = *(*int64)(unsafe.Pointer(&sample.SampleValue[0]))
        ...
    }
}

It works, Thank you.

@klueska Hi, I found another case:
GetSamples always return 100 samples. And I can't distinguish the records

I solved it. by checking timestamp.
The bad record's timestamp equals to 0

hi, @klueska , I have a question: how to get the type of a process with go-nvml.

For ex:

with nvidia-smi: nvidia-smi -q

....
    Max Customer Boost Clocks
        Graphics                          : 1590 MHz
    Clock Policy
        Auto Boost                        : N/A
        Auto Boost Default                : N/A
    Processes
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2298824
            Type                          : C
            Name                          : nvidia-cuda-mps-server
            Used GPU Memory               : 25 MiB
        GPU instance ID                   : N/A
        Compute instance ID               : N/A
        Process ID                        : 2302001
            Type                          : M+C
            Name                          : tf_serving_1_15
            Used GPU Memory               : 519 MiB

How can I get this value Type : M+C with go-nvml

I'm not sure what the M there is. Normally you see either C, G, or C+G.

Where C means that the process is contained in the results of:

DeviceGetComputeRunningProcesses()

And G means that the process is contained in the results of:

DeviceGetComputeRunningProcesses()

If the process is returned by both calls, a C+G would be listed.

@klueska M+C means: The process in running on a MPS server and it's in Compute mode.

image