NVIDIA / go-nvml

Go Bindings for the NVIDIA Management Library (NVML)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Update `nvml.h` from `anaconda.org/nvidia` rather than CUDA docker images

XuehaiPan opened this issue · comments

Motivation:

  1. The docker image at DockerHub is behind the latest version of CUDA (11.4.2 on DockerHub (updated two months ago) but 11.5.1 is released).
  2. The tags at anaconda.org/nvidia are hosted in a more structured way (repodata.json).
  3. Smaller download size: we only need nvml.h

Current Makefile cannot build bindings with CUDA_VERSION=11.4.2:

$ make CUDA_VERSION=11.4.2 update-nvml-h
if [[ 11.4.2 == "" ]]; then echo "define CUDA_VERSION to update"; exit 1; fi
/bin/sh: 1: [[: not found
docker run \
        --rm \
        -v /home/panxuehai/Projects/go-nvml:/home/panxuehai/Projects/go-nvml \
        -w /home/panxuehai/Projects/go-nvml \
        --user $(id -u):$(id -g) \
        nvidia/cuda:11.4.2-devel \
                cp /usr/local/cuda-11.4/targets/x86_64-linux/include/nvml.h /home/panxuehai/Projects/go-nvml/gen/nvml
Unable to find image 'nvidia/cuda:11.4.2-devel' locally
docker: Error response from daemon: manifest for nvidia/cuda:11.4.2-devel not found: manifest unknown: manifest unknown.
See 'docker run --help'.
make: *** [Makefile:100: .copy-nvml-h] Error 125

Tag nvidia/cuda:11.4.2-devel does not exist. Suffix -ubuntu20.04 is needed. Ref: https://hub.docker.com/r/nvidia/cuda/tags

@XuehaiPan you are correct that the distribution suffix is required. It seems that the published tags have changed since we last updated the nvml.h definition.

Note that as a workaround you could pull and retag the image locally:

docker pull nvidia/cuda:11.4.2-devel-ubuntu20.04
docker tag nvidia/cuda:11.4.2-devel-ubuntu20.04 nvidia/cuda:11.4.2-devel

and re-run the make command. This would mean that the image is available locally.

You motivation for pulling the file from anaconda seems reasonable. Updated: Please feel free to submit a PR with the changes.