NVIDIA / libnvidia-container

NVIDIA container runtime library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

demo in readme not work

idreamerhx opened this issue · comments

My os: archlinux latest

extra/egl-wayland 2:1.1.11-2 [installed]
extra/libvdpau 1.5-1 [installed]
extra/libxnvctrl 520.56.06-1 [installed]
extra/nvidia-dkms 520.56.06-2 [installed]
extra/nvidia-prime 1.0-4 [installed]
extra/nvidia-settings 520.56.06-1 [installed]
extra/nvidia-utils 520.56.06-2 [installed]
extra/opencl-nvidia 520.56.06-2 [installed]
community/cuda 11.8.0-1 [installed]
community/cuda-tools 11.8.0-1 [installed]
community/cudnn 8.5.0.96-1 [installed]
community/nccl 2.14.3-1 [installed]
community/nvtop 3.0.0-1 [installed]
archlinuxcn/libnvidia-container 1.9.0-1 [installed]
archlinuxcn/libnvidia-container-tools 1.9.0-1 [installed]
archlinuxcn/nvidia-container-toolkit 1.9.0-1 [installed]

NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: 11.8

run the demo in readme.md failed:

Setup a new set of namespaces

cd $(mktemp -d) && mkdir rootfs
sudo unshare --mount --pid --fork

Setup a rootfs based on Ubuntu 16.04 inside the new namespaces

curl http://cdimage.ubuntu.com/ubuntu-base/releases/16.04/release/ubuntu-base-16.04.6-base-amd64.tar.gz | tar -C rootfs -xz
useradd -R $(realpath rootfs) -U -u 1000 -s /bin/bash nvidia
mount --bind rootfs rootfs
mount --make-private rootfs
cd rootfs

Mount standard filesystems

mount -t proc none proc
mount -t sysfs none sys
mount -t tmpfs none tmp
mount -t tmpfs none run

Isolate the first GPU device along with basic utilities

nvidia-container-cli --load-kmods configure --ldconfig=@/sbin/ldconfig.real --no-cgroups --utility --device 0 $(pwd)

** --ldconfig=@/sbin/ldconfig.real ** I removed this

Change into the new rootfs

pivot_root . mnt # this not work
umount -l mnt
exec chroot --userspec 1000:1000 . env -i bash

Run nvidia-smi from within the container

nvidia-smi -L

in chrooted rootfs nvidia-smi show nothing.

I tried:

arch-chroot ubu22x86-base

and in another shell cd ubu22x86-base and nvidia-container-cli --load-kmods configure --no-cgroups --utility --device 0 $(pwd)

nvidia-smi works

but a simple program cudaGetDeviceCount returns error code 35.

in chrooted rootfs

NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: N/A

I installed cuda toolkit cuda_11.8.0_520.61.05_linux.run

is there any document about nvidia-container-cli

in chrooted ubuntu 22
apt install nvidia-cuda-toolkit

NVIDIA-SMI 520.56.06 Driver Version: 520.56.06 CUDA Version: 11.6

hey, guys would you please add some document really simple such as :
nvidia-container-cli --load-kmods configure --no-cgroups --utility --compute --device 0 $(pwd)

this works to show cuda version。

or just add utility compute in --usage or --help

there is another issue. when exit chrooted rootfs,

umount: /mnt/data1/chroots/ubu22x86-cuda118/dev: target is busy.
umount: /mnt/data1/chroots/ubu22x86-cuda118/proc: target is busy.

should I clearup myself/?