docker / docker-py

A Python library for the Docker Engine API

Home Page:https://docker-py.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

device_requests with gpu not working with podman runtime

htjain opened this issue · comments

I am trying following code on NVIDIA GPU environment.

import docker
import os
os.environ['DOCKER_HOST'] = "unix:///run/user/1000/podman/podman.sock"
#os.environ['DOCKER_HOST'] = "unix:///run/podman/podman.sock"
client = docker.from_env()
logs = client.containers.run('nvidia/cuda:12.2.0-devel-ubuntu20.04',
                              "nvidia-smi",
                               device_requests=[docker.types.DeviceRequest(count=-1,capabilities=[['gpu']])])

but getting following error

Traceback (most recent call last):
  File "/home/user/podman_gpu.py", line 7, in <module>
    logs = client.containers.run('nvidia/cuda:12.2.0-devel-ubuntu20.04',
  File "/usr/local/lib/python3.9/site-packages/docker/models/containers.py", line 887, in run
    raise ContainerError(
docker.errors.ContainerError: Command 'nvidia-smi' in image 'nvidia/cuda:12.2.0-devel-ubuntu20.04' returned non-zero exit status 127: b'/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: nvidia-smi: not found\n'

On the same machine podman CLI able to access GPU.

[user@rh91-bay7 ~]$ podman run --rm --device nvidia.com/gpu=all nvidia/cuda:12.2.0-devel-ubuntu20.04 nvidia-smi -L

==========
== CUDA ==
==========

CUDA Version 12.2.0

Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.

GPU 0: Tesla T4 (UUID: GPU-f7c1d1ba-7a85-537a-65ae-462ce7d7eca8)
[user@rh91-bay7 ~]$

Any ideas how to get it working ?

podman version: 4.4.1
host OS: RHEL 9.2
docker py version: 6.1.3

Sorry, I can't provide specific help here. The Python library is creating your container successfully, but it looks like something is wrong with the entrypoint wrapper script or PATH:

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: nvidia-smi: not found

Sorry, I can't provide specific help here. The Python library is creating your container successfully, but it looks like something is wrong with the entrypoint wrapper script or PATH:

/opt/nvidia/nvidia_entrypoint.sh: line 67: exec: nvidia-smi: not found

Same image is used in both podman cli and Python library and it works fine with podman cli, that rules out issue with the container image/entrypoint wrapper script.
I believe proper handling of device_request is missing in Python library for podman runtime.
Same code is working in docker runtime. @milas

I'm happy to accept a PR if there's a straightforward fix here, but in general this library is only targeting Docker/Moby runtime, so Podman compatibility is really dependant on how well it emulates the Moby API