docker / compose

Define and run multi-container applications with Docker

Home Page:https://docs.docker.com/compose/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for NVIDIA GPUs under Docker Compose

collabnix opened this issue · comments

Under Docker 19.03.0 Beta 2, support for NVIDIA GPU has been introduced in the form of new CLI API --gpus. docker/cli#1714 talk about this enablement.

Now one can simply pass --gpus option for GPU-accelerated Docker based application.

$ docker run -it --rm --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
f476d66f5408: Pull complete 
8882c27f669e: Pull complete 
d9af21273955: Pull complete 
f5029279ec12: Pull complete 
Digest: sha256:d26d529daa4d8567167181d9d569f2a85da3c5ecaf539cace2c6223355d69981
Status: Downloaded newer image for ubuntu:latest
Tue May  7 15:52:15 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    22W /  75W |      0MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
:~$ 

As of today, Compose doesn't support this. This is a feature request for enabling Compose to support for NVIDIA GPU.

This is of increased importance now that the (now) legacy 'nvidia runtime' appears broken with Docker 19.03.0 and nvidia-container-toolkit-1.0.0-2: NVIDIA/nvidia-docker#1017

$ cat docker-compose.yml 
version: '2.3'

services:
 nvidia-smi-test:
  runtime: nvidia
  image: nvidia/cuda:9.2-runtime-centos7

$ docker-compose run nvidia-smi-test
Cannot create container for service nvidia-smi-test: Unknown runtime specified nvidia

This works: docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

This does not: docker run --runtime=nvidia nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

Any work happening on this?

I got the new Docker CE 19.03.0 on a new Ubuntu 18.04 LTS machine, have the current and matching NVIDIA Container Toolkit (née nvidia-docker2) version, but cannot use it because docker-compose.yml 3.7 doesn't support the --gpus flag.

Is there a workaround for this?

This works: docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

This does not: docker run --runtime=nvidia nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

You need to have

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

in your /etc/docker/daemon.json for --runtime=nvidia to continue working. More info here.

It is an urgent need. Thank you for your effort!

Is it intended to have user manually populate /etc/docker/daemon.json after migrating to docker >= 19.03 and removing nvidia-docker2 to use nvidia-container-toolkit instead?

It seems that this breaks a lot of installations. Especially, since --gpus is not available in compose.

No, this is a work around for until compose does support the gpus flag.

install nvidia-docker-runtime:
https://github.com/NVIDIA/nvidia-container-runtime#docker-engine-setup
add to /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}

docker-compose:
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all

There is no such thing like "/usr/bin/nvidia-container-runtime" anymore. Issue is still critical.

it will help run nvidia environment with docker-compose, untill fix docker-compose

install nvidia-docker-runtime:
https://github.com/NVIDIA/nvidia-container-runtime#docker-engine-setup
add to /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}

docker-compose:
runtime: nvidia
environment:

  • NVIDIA_VISIBLE_DEVICES=all

This is not working for me, still getting the Unsupported config option for services.myservice: 'runtime' when trying to run docker-compose up

any ideas?

This is not working for me, still getting the Unsupported config option for services.myservice: 'runtime' when trying to run docker-compose up

any ideas?

after modify /etc/docker/daemon.json, restart docker service
systemctl restart docker
use Compose format 2.3 and add runtime: nvidia to your GPU service. Docker Compose must be version 1.19.0 or higher.
docker-compose file:
version: '2.3'

services:
nvsmi:
image: ubuntu:16.04
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
command: nvidia-smi

@cheperuiz, you can set nvidia as default runtime in daemon.json and will not be dependent on docker-compose. But all you docker containers will use nvidia runtime - I have no issues so far.
{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, }

Ah! thank you @Kwull , i missed that default-runtime part... Everything working now :)

@uderik, runtime is no longer present in the current 3.7 compose file format schema, nor in the pending 3.8 version that should eventually align with Docker 19.03: https://github.com/docker/compose/blob/5e587d574a94e011b029c2fb491fb0f4bdeef71c/compose/config/config_schema_v3.8.json

@johncolby runtime has never been a 3.x flag. It's only present in the 2.x track, (2.3 and 2.4).

Yeah, I know, and even though my docker-compose.yml file includes the version: '2.3' (which have worked in the past) it seems to be ignored by the latest versions...
For future projects, what would be the correct way to enable/disable access to the GPU? just making it default + env variables? or will there be support for the --gpus flag?

@johncolby what is the replacement for runtime in 3.X?

@Daniel451 I've just been following along peripherally, but it looks like it will be under the generic_resources key, something like:

services:
  my_app:
    deploy:
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
                kind: 'gpu'
                value: 2

(from https://github.com/docker/cli/blob/9a39a1/cli/compose/loader/full-example.yml#L71-L74)
Design document here: https://github.com/docker/swarmkit/blob/master/design/generic_resources.md

Here is the compose issue regarding compose 3.8 schema support, which is already merged in: #6530

On the daemon side the gpu capability can get registered by including it in the daemon.json or dockerd CLI (like the previous hard-coded runtime workaround), something like

/usr/bin/dockerd --node-generic-resource gpu=2

which then gets registered by hooking into the NVIDIA docker utility:
https://github.com/moby/moby/blob/09d0f9/daemon/nvidia_linux.go

It looks like the machinery is basically in place, probably just needs to get documented...

Any update?

Also waiting on updates, using bash with docker run --gpus until the official fix...

Waiting for updates asw ell.

Also waiting for updates :)

Ok... I don't understand why this is still open. These 3 additional lines make it work with schema version 3.7. Glad to know docker is responsive to trivial community issues. So clone this repo, make add these three lines, and python3 setup.py build && install it, and make sure your docker-compose.yml is version 3.7.

[ruckc@omnilap compose]$ git diff
diff --git a/compose/config/config_schema_v3.7.json b/compose/config/config_schema_v3.7.json
index cd7882f5..d25d404c 100644
--- a/compose/config/config_schema_v3.7.json
+++ b/compose/config/config_schema_v3.7.json
@@ -151,6 +151,7 @@

         "external_links": {"type": "array", "items": {"type": "string"}, "uniqueItems": true},
         "extra_hosts": {"$ref": "#/definitions/list_or_dict"},
+        "gpus": {"type": ["number", "string"]},
         "healthcheck": {"$ref": "#/definitions/healthcheck"},
         "hostname": {"type": "string"},
         "image": {"type": "string"},
diff --git a/compose/service.py b/compose/service.py
index 55d2e9cd..71188b67 100644
--- a/compose/service.py
+++ b/compose/service.py
@@ -89,6 +89,7 @@ HOST_CONFIG_KEYS = [
     'dns_opt',
     'env_file',
     'extra_hosts',
+    'gpus',
     'group_add',
     'init',
     'ipc',
@@ -996,6 +997,7 @@ class Service(object):
             dns_opt=options.get('dns_opt'),
             dns_search=options.get('dns_search'),
             restart_policy=options.get('restart'),
+            gpus=options.get('gpus'),
             runtime=options.get('runtime'),
             cap_add=options.get('cap_add'),
             cap_drop=options.get('cap_drop'),

I just added an internal issue to track that.
Remember that PRs are welcome 😃

Ok... I don't understand why this is still open. These 3 additional lines make it work with schema version 3.7. Glad to know docker is responsive to trivial community issues. So clone this repo, make add these three lines, and python3 setup.py build && install it, and make sure your docker-compose.yml is version 3.7.

[ruckc@omnilap compose]$ git diff
diff --git a/compose/config/config_schema_v3.7.json b/compose/config/config_schema_v3.7.json
index cd7882f5..d25d404c 100644
--- a/compose/config/config_schema_v3.7.json
+++ b/compose/config/config_schema_v3.7.json
@@ -151,6 +151,7 @@

         "external_links": {"type": "array", "items": {"type": "string"}, "uniqueItems": true},
         "extra_hosts": {"$ref": "#/definitions/list_or_dict"},
+        "gpus": {"type": ["number", "string"]},
         "healthcheck": {"$ref": "#/definitions/healthcheck"},
         "hostname": {"type": "string"},
         "image": {"type": "string"},
diff --git a/compose/service.py b/compose/service.py
index 55d2e9cd..71188b67 100644
--- a/compose/service.py
+++ b/compose/service.py
@@ -89,6 +89,7 @@ HOST_CONFIG_KEYS = [
     'dns_opt',
     'env_file',
     'extra_hosts',
+    'gpus',
     'group_add',
     'init',
     'ipc',
@@ -996,6 +997,7 @@ class Service(object):
             dns_opt=options.get('dns_opt'),
             dns_search=options.get('dns_search'),
             restart_policy=options.get('restart'),
+            gpus=options.get('gpus'),
             runtime=options.get('runtime'),
             cap_add=options.get('cap_add'),
             cap_drop=options.get('cap_drop'),

i tried your solution but I get a lot of errors about that flag:

ERROR: for <SERVICE_NAME>  __init__() got an unexpected keyword argument 'gpus'
Traceback (most recent call last):
  File "/usr/local/bin/docker-compose", line 11, in <module>
    load_entry_point('docker-compose==1.25.0.dev0', 'console_scripts', 'docker-compose')()
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 71, in main
    command()
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 127, in perform_command
    handler(command, command_options)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 1106, in up
    to_attach = up(False)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 1102, in up
    cli=native_builder,
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/project.py", line 569, in up
    get_deps,
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 112, in parallel_execute
    raise error_to_reraise
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 210, in producer
    result = func(obj)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/project.py", line 555, in do
    renew_anonymous_volumes=renew_anonymous_volumes,
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 546, in execute_convergence_plan
    scale, detached, start
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 468, in _execute_convergence_create
    "Creating"
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 112, in parallel_execute
    raise error_to_reraise
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 210, in producer
    result = func(obj)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 466, in <lambda>
    lambda service_name: create_and_start(self, service_name.number),
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 454, in create_and_start
    container = service.create_container(number=n, quiet=True)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 337, in create_container
    previous_container=previous_container,
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 913, in _get_container_create_options
    one_off=one_off)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 1045, in _get_container_host_config
    cpu_rt_runtime=options.get('cpu_rt_runtime'),
  File "/usr/local/lib/python3.6/dist-packages/docker-4.0.2-py3.6.egg/docker/api/container.py", line 590, in create_host_config
    return HostConfig(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'gpus'

Do I need a specific python docker package ?

@DarioTurchi Yeah, I met the exact issue. Seems the type of HostConfig needs to be updated also.

I don't believe the change described by @ruckc is sufficient, because docker-py will also need a change. And it looks like the necessary docker-py change is still being worked on. See here:
docker/docker-py#2419

Here is the branch with the changes:
https://github.com/sigurdkb/docker-py/tree/gpus_parameter

So if you wish to patch this in now you'll have to build docker-compose against a modified docker-py from https://github.com/sigurdkb/docker-py/tree/gpus_parameter

I don't get what is going on here:

  1. I have in /etc/docker/daemon.json
{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

but runtime key cannot be used anymore in v3.x as for #6239

I have tried also:

{
	"default-runtime": "nvidia",
	"runtimes": {
		"nvidia": {
			"path": "/usr/bin/nvidia-container-runtime",
			"runtimeArgs": []
		}
	}
}

So I cannot start my containers with gpu support on docker-compose anymore:

bertserving_1    | I:VENTILATOR:[__i:_ge:222]:get devices
bertserving_1    | W:VENTILATOR:[__i:_ge:246]:no GPU available, fall back to CPU

Before those changes it worked, so what can I do now?

+1 it will be very useful to have such feature in docker-compose!

commented

Any eta?

+1 would be useful feature for docker-compose

This feature would be an awesome addition to docker-compose

Right now my solution for this is using 2.3 version of docker-compose file, that support runtime, and manually installing the nvidia-container-runtime (since it is no longer installed with the nvidia-docker).
Also I'm settings the runtime configs in the /etc/docker/daemon.json (not as default, just as an available runtime).
With this I can use a compose file as such:

version: '2.3'
services:
  test:
    image: nvidia/cuda:9.0-base
    runtime: nvidia

Right now my solution for this is using 2.3 version of docker-compose file, that support runtime, and manually installing the nvidia-container-runtime (since it is no longer installed with the nvidia-docker).
Also I'm settings the runtime configs in the /etc/docker/daemon.json (not as default, just as an available runtime).
With this I can use a compose file as such:

version: '2.3'
services:
  test:
    image: nvidia/cuda:9.0-base
    runtime: nvidia

@arruda Would you mind sharing your daemon.json please?

Right now my solution for this is using 2.3 version of docker-compose file, that support runtime, and manually installing the nvidia-container-runtime (since it is no longer installed with the nvidia-docker).
Also I'm settings the runtime configs in the /etc/docker/daemon.json (not as default, just as an available runtime).
With this I can use a compose file as such:

version: '2.3'
services:
  test:
    image: nvidia/cuda:9.0-base
    runtime: nvidia

@arruda Would you mind sharing your daemon.json please?

Yeah, no problem, here it is:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
commented

Hi

I have an application which requires NVIDIA drivers. I have built a docker image based on (FROM)
nvidia/cudagl:10.1-runtime-ubuntu18.04

Using the approach recommended above - does it mean my image does not need to be derived from nvidia/cudagl:10.1-runtime-ubuntu18.04 ? I.e. I could simply derive from (FROM) python:3.7.3-stretch
and add runtime: nvidia to the service in docker-compose ?

Thanks

@rfsch No, that's a different thing. runtime: nvidia in docker-compose refers to the Docker runtime. This makes the GPU available to the container. But you still need some way to use them once they're made available. runtime in nvidia/cudagl:10.1-runtime-ubuntu18.04 refers to the CUDA runtime components. This lets you use the GPUs (made available in a container by Docker) using CUDA.

In this image:

Docker architecture

runtime: nvidia replaces the runc/containerd part. nvidia/cudagl:10.1-runtime-ubuntu18.04 is completely outside the picture.

we need this feature

@Daniel451 I've just been following along peripherally, but it looks like it will be under the generic_resources key, something like:

services:
  my_app:
    deploy:
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
                kind: 'gpu'
                value: 2

(from https://github.com/docker/cli/blob/9a39a1/cli/compose/loader/full-example.yml#L71-L74)
Design document here: https://github.com/docker/swarmkit/blob/master/design/generic_resources.md

Here is the compose issue regarding compose 3.8 schema support, which is already merged in: #6530

On the daemon side the gpu capability can get registered by including it in the daemon.json or dockerd CLI (like the previous hard-coded runtime workaround), something like

/usr/bin/dockerd --node-generic-resource gpu=2

which then gets registered by hooking into the NVIDIA docker utility:
https://github.com/moby/moby/blob/09d0f9/daemon/nvidia_linux.go

It looks like the machinery is basically in place, probably just needs to get documented...

Hey, @johncolby, I tried this, but failed:

ERROR: The Compose file './docker-compose.yml' is invalid because:
services.nvidia-smi-test.deploy.resources.reservations value Additional properties are not allowed ('generic_resources' was unexpected)

any suggestions?

Thanks
David

Installing nvidia-container-runtime 3.1.4.1 from https://github.com/NVIDIA/nvidia-container-runtime and putting

runtime: nvidia

works fine here with docker-compose 1.23.1 and 1.24.1 as installed from https://docs.docker.com/compose/install/ using this dodgy looking command:

sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

and e.g. the nvidia/cudagl/10.1-base container from dockerhub. I've tried cuda and OpenGL rendering and it's all near native performance.

Internally tracked as COMPOSE-82
Please note that such a change need also to be implemented in docker stack (https://github.com/docker/cli/blob/master/cli/compose/types/types.go#L156) for consistency

Installing nvidia-container-runtime 3.1.4.1 from https://github.com/NVIDIA/nvidia-container-runtime and putting

runtime: nvidia

works fine here with docker-compose 1.23.1 and 1.24.1 as installed from https://docs.docker.com/compose/install/ using this dodgy looking command:

sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

and e.g. the nvidia/cudagl/10.1-base container from dockerhub. I've tried cuda and OpenGL rendering and it's all near native performance.

can you share your docker-compose.yml ?

hey, @jdr-face,

here is my test following your suggestion, by install nvidia-container-runtime at host machine.

version: '3.0'

services:
  nvidia-smi-test:
    runtime: nvidia
    volumes:
      - /tmp/.X11-unix:/tmp/.X11-unix 
    environment:
     - NVIDIA_VISIBLE_DEVICES=0 
     - DISPLAY
    image: vkcube

it still give the error:

       Unsupported config option for services.nvidia-smi-test: 'runtime'

@david-gwa as noted by andyneff earlier:

runtime has never been a 3.x flag. It's only present in the 2.x track, (2.3 and 2.4).

@david-gwa

can you share your docker-compose.yml ?

version: '2.3'

services:
    container:
        image: "nvidia/cudagl/10.1-base"

        runtime: "nvidia" 

        security_opt:
            - seccomp:unconfined
        privileged: true

        volumes:
            - $HOME/.Xauthority:/root/.Xauthority:rw
            - /tmp/.X11-unix:/tmp/.X11-unix:rw
        
        environment:
          - NVIDIA_VISIBLE_DEVICES=all

Depending on your needs some of those options may be unnecessary. As @muru predicted, the trick is to specify an old version. At least for my use case this isn't a problem, but I only offer this config as a workaround, really it should be made possible using the latest version.

thanks guys, @jdr-face , @muru , compose v2 does work,
I mis-understood your solution is for v3 compose.

For the record, traditionally speaking: compose v2 is not older than compose v3. They are different use cases. v3 is geared towards swarm while v2 is not. v1 is old.

Is there any discussion about the support of Docker-compose for Docker's native GPU support?

Supporting runtime option is not the solution for GPU support in the future. NVIDIA describes about the future of nvidia-docker2 in https://github.com/NVIDIA/nvidia-docker as follows.

Note that with the release of Docker 19.03, usage of nvidia-docker2 packages are deprecated since NVIDIA GPUs are now natively supported as devices in the Docker runtime.

Currently, GPU support can be realized by changing the runtime, but it is highly possible that this will not work in the future.

commented

To be frank, this maybe not the best practise, but somehow we make it work.

The tricky part is that we have to stick with docker-compose v3.x since we are use docker swarm, meanwhile we want to use the Nvidia Runtime to support GPU/CUDA in the containers.

To avoid explicitly telling the Nvidia Runtime inside the docker-compose file, we set the Nvidia as the default runtime in /etc/docker/daemon.json, and it will looks like

{
    "default-runtime":"nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Such that all the containers running on the GPU machines will default enable the Nvidia runtime.

Hope this can help someone facing the similar blocker

To be frank, this maybe not the best practise, but somehow we make it work.

The tricky part is that we have to stick with docker-compose v3.x since we are use docker swarm, meanwhile we want to use the Nvidia Runtime to support GPU/CUDA in the containers.

To avoid explicitly telling the Nvidia Runtime inside the docker-compose file, we set the Nvidia as the default runtime in /etc/docker/daemon.json, and it will looks like

{
    "default-runtime":"nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Such that all the containers running on the GPU machines will default enable the Nvidia runtime.

Hope this can help someone facing the similar blocker

This is indeed what we do as well. It works for now, but it feels a little hacky to me. Hoping for full compose-v3 support soon. :)

Is it intended to have user manually populate /etc/docker/daemon.json after migrating to docker >= 19.03 and removing nvidia-docker2 to use nvidia-container-toolkit instead?

It seems that this breaks a lot of installations. Especially, since --gpus is not available in compose.

--gpus is not available in compose
I can not use pycharm to link docker to run tensorflow-gpu

Any updates on this issue? Is there a chance that the --gpus will be supported in docker-compose soon?

For those of you looking for a workaround this what we ended up doing:

And then run COMPOSE_API_VERSION=auto docker-compose run gpu with the following file:

version: '3.7'

services:
    gpu:
        image: 'nvidia/cuda:9.0-base'
        command: 'nvidia-smi'
        device_requests:
            - capabilities:
               - "gpu"
commented

Under Docker 19.03.0 Beta 2, support for NVIDIA GPU has been introduced in the form of new CLI API --gpus. docker/cli#1714 talk about this enablement.

Now one can simply pass --gpus option for GPU-accelerated Docker based application.

$ docker run -it --rm --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
f476d66f5408: Pull complete 
8882c27f669e: Pull complete 
d9af21273955: Pull complete 
f5029279ec12: Pull complete 
Digest: sha256:d26d529daa4d8567167181d9d569f2a85da3c5ecaf539cace2c6223355d69981
Status: Downloaded newer image for ubuntu:latest
Tue May  7 15:52:15 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    22W /  75W |      0MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
:~$ 

As of today, Compose doesn't support this. This is a feature request for enabling Compose to support for NVIDIA GPU.

I have solved this problems,you can have a try as follows, my csdn blog address: https://blog.csdn.net/u010420283/article/details/104055046

~$ sudo apt-get install nvidia-container-runtime
~$ sudo vim /etc/docker/daemon.json

then , in this daemon.json file, add this content:

{
"default-runtime": "nvidia"
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}

~$ sudo systemctl daemon-reload
~$ sudo systemctl restart docker

For the ansible users who want to setup the workaround described before, there is a role to install nvidia-container-runtime and configure the /etc/docker/deamon.json to use runtime: nvidia:

https://github.com/NVIDIA/ansible-role-nvidia-docker

(for some reason it runs only on Ubuntu and RHEL, but it's quite easy to modify. I run it on Debian)

Then in your docker-compose.yml:

version: "2.4"
services:
  test:
    image: "nvidia/cuda:10.2-runtime-ubuntu18.04"
    command: "nvidia-smi"

any update on official 3.x version with gpu support? We need on swarm :)

Is there any plan to add this feature?

This feature depends on docker-py implementing the device_requests parameters, which is what --gpus translates to. There have been multiple pull requests to add this feature (docker/docker-py#2419, docker/docker-py#2465, docker/docker-py#2471) but there are no reactions from any maintainer. #7124 uses docker/docker-py#2471 to provide it in Compose, but still no reply from anyone.

As I mentioned in #7124 I'm more than happy to make the PR more compliant but since it's gotten very little attention I don't want to waste my time in something that's not going to be merged ...

Please add this feature, will be awesome!

Please, add this feature! I was more than happy with the old nevidia-docker2, which allowed me to change the runtime in the daemon.json. Would be extremely nice to have this back.

Need it, please. Really need it :/

I'd like to pile on as well... we need this feature!

commented

I need to run both CPU and GPU containers on the same machine so the default runtime hack doesn't work for me. Do we have any idea when this will work on compose? Given that that we don't have the runtime flag in compose this represents a serious functionality regression, does it not? I'm having to write scripts in order to make this work - yuck!

I need to run both CPU and GPU containers on the same machine so the default runtime hack doesn't work for me. Do we have any idea when this will work on compose? Given that that we don't have the runtime flag in compose this represents a serious functionality regression, does it not? I'm having to write scripts in order to make this work - yuck!

you can do it by docker cli (docker run --gpu ....), i have this kind of trick (by adding a proxy, to be able to communicato with other containers running on other nodes on swarm). We are all waiting for the ability to run it on swarm, because it don't work by docker service command (as i know) nor by compose.

commented

@dottgonzo . Well, yes ;-). I am aware of this and hence the reference to scripts. But this is a pretty awful and non-portable way of doing it so I'd like to do it in a more dynamic way. As I said, I think that this represents a regression, not a feature ask.

COMPOSE_API_VERSION=auto docker-compose run gpu

@ggregoire where do we run: COMPOSE_API_VERSION=auto docker-compose run gpu ?

@joehoeller from your shell just was you would do for any other command.

Right now we are deciding for every project if we need 3.x features or if we can use docker-compose 2.x where the GPU option is still supported. Features like running multistage targets from a Dockerfile can sadly not be used if GPU is necessary. Please add this back in!

I'd like to recommend something like an "additional options" field for docker-compose where we can just add flags like --gpus=all to the docker start/run command, that are not yet/anymore supported in docker-compose but are in the latest docker version. This way, compose users won't have to wait for docker-compose to catch up if they need a new not yet supported docker feature.

Is still necessary to run this on Docker Swarm for production environments. Will this be useful por Docker Swarm?

@sebastianfelipe It's very useful if you want to deploy to your swarm using compose.
Compare:
docker service create --generic-resource "gpu=1" --replicas 10 \ --name sparkWorker <image_name> \"service ssh start && \ /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://<spark_master_ip>:7077\"

to something like this

docker stack deploy --compose-file docker-compose.yml stackdemo

Sorry, so is it already working with Docker Swarm using the docker-compose yaml file? Just to be sure :O. Thanks!

only for docker compose 2.x

The entire point of this issue is to request nvidia-docker gpu support for docker-compose 3+

It's been almost a year since the original request!! Why the delay?? Can we move this forward ??

For those of you looking for a workaround this what we ended up doing:

And then run COMPOSE_API_VERSION=auto docker-compose run gpu with the following file:

version: '3.7'

services:
    gpu:
        image: 'nvidia/cuda:9.0-base'
        command: 'nvidia-smi'
        device_requests:
            - capabilities:
               - "gpu"

For those of you who are as impatient as I am, here's an easy pip install version of the above workaround:

pip install git+https://github.com/docker/docker-py.git@refs/pull/2471/merge
pip install git+https://github.com/docker/compose.git@refs/pull/7124/merge
pip install python-dotenv

Huge kudos to @yoanisgil !
Still anxiously waiting for an official patch. With all the PRs in place, it doesn't seem difficult by any standard.

ping @KlaasH @ulyssessouza @Goryudyuma @chris-crone . Any update on this?

No, I don't know why I was called.
I want you to tell me what to do?

I hope there is an update on this.

Yeah, it's been more than a year now... why are they not merging in docker-py...

I'm not sure that the proposed implementations are the right ones for the Compose format. The good news is that we've opened up the Compose format specification with the intention of adding things like this. You can find the spec at https://github.com/compose-spec.

What I'd suggest we do is add an issue on the spec and then discuss it at one of the upcoming Compose community meetings (link to invite at the bottom of this page).

This works: docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi
This does not: docker run --runtime=nvidia nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

You need to have

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

in your /etc/docker/daemon.json for --runtime=nvidia to continue working. More info here.

Dockerd doesn't start with this daemon.json

Christ, this is going to take years :@

commented

This works: docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi
@deniswal : Yes, we know this, but we are asking about compose functionality.

@chris-crone: I'm confused: This represents a regression from former behavior, why does it need a new feature specification? Isn't it reasonable to run containers, some of which use GPU and some of which use CPU on the same physical box?

Thanks for the consideration.

@vk1z AFAIK Docker Compose has never had GPU support so this is not a regression. The part that needs design is how to declare a service's need for a GPU (or other device) in the Compose format– specifically changes like this. After that, it should just be plumbing to the backend.

Hi Guys, I've tried some solutions proposed here and nothing worked to me, for example @miriaford do not worked in my case, also is there some way to use GPU to run my existent docker containers?
I've an i7 with 16GB of ram but the build for some projects takes too long to complete, my goal is to also use GPU power to speed up the process, is that possible? Thanks!

commented

@chris-crone : Again, I will be willing to be corrected, but wasn't that because the runtime: parameter disappeared from compose after 2.4 config? That is why I felt that it was a regression. But no, matter now since we all should be on 3.x anyway.

I'd be glad to file an issue, do we do that against the spec in the spec repo, correct?

but wasn't that because the runtime: parameter disappeared from compose after 2.4 config? That is why I felt that it was a regression.

Yes, exactly. I have a couple of projects where we rely on using runtime: nvidia in our docker-compose files, and this issue blocks us from upgrading to 3.x because we haven't found a way to use GPUs there.

Hi, please, please, please fix this.
This should be marked mission critical priority -20

Again, I will be willing to be corrected, but wasn't that because the runtime: parameter disappeared from compose after 2.4 config? That is why I felt that it was a regression. But no, matter now since we all should be on 3.x anyway.

I wasn't here when the change was made so I'm not 100 % sure why it was dropped. I know that you do not need the NVIDIA runtime to use GPUs any more and that we are evolving the Compose v3 spec in the open here with the intention of making a single version of the spec. This may mean moving some v2 functionality into v3.

In terms of the runtime field, I don't think this is how it should be added to the Compose spec as it is very specific to running on a single node. Ideally we'd want something that'd allow you to specify that your workload has a device need (e.g.: GPU, TPU, whatever comes next) and then let the orchestrator assign the workload to a node that provides that capability.

This discussion should be had on the specification though as it's not Python Docker Compose specific.

commented

@chris-crone: I mostly concur with your statement. Adding short term hacks is probably the incorrect way to do this since we have a proliferation of edge devices each with their own runtimes. For example, as you point out, TPU (Google), VPU(Intel) and ARM GPU on the Pi. So we do need a more complete story.

I'll file an issue against the specification today and update this thread once I have done so. However, I do think that the orchestrator should be independent - such as if I want to use Kube, I should be able to do so. I'm assuming that will be in scope.

I do however, disagree with the using GPUs statement, since that doesn't work with compose - which is what this is all about. But I think we all understand what problem we would like solved.

commented

@chris-crone : Please see the docker-compose spec issue filed. I'll follow updates against that issue from now on.

Can we simply add an option (something like extra_docker_run_args) to pass arguments directly to the underlying docker run? This will not only solve the current problem, but also be future-proof: what if docker adds support for whatever "XPU", "YPU", or any other new features that might come in the future?

If we need a long back-and-forth discussion every time docker adds a new feature, it will be extremely inefficient and cause inevitable delay (and unnecessary confusion) between docker-compose and docker updates. Supporting argument delegation can provide temporary relief for this recurrent issue for all future features.

commented

@miriaford I'm not sure that passing an uninterpreted blob supports the compose notion of being declarative. The old runtime tag at least indicated that it was something to do with the runtime. Given the direction in which docker is trending (docker-apps), it seems to me that doing this would make declarative deployment harder since an orchestrator would have to parse arbitrary blobs.

But I agree that compose and docker should be synchronized and zapping working features that people depend on (even though it was a major release) isn't quite kosher.

@vk1z I agree - there should be a much better sync mechanism between compose and docker. However, I don't expect such mechanism to be designed any time soon. Meanwhile we also need a temporary way to do our own synchronization without hacking deep into the source code.

If the argument delegation proposal isn't an option, what do we suggest we do? I agree it isn't a pretty solution, but it's at least much better than this workaround, isn't it? #6691 (comment)

@miriaford docker-compose does not call the docker executive with argument, it actually uses the docker_py which uses the http API to the docker daemon. So there is no "underlying docker run" command. The docker CLI is not an API, the socket connection is the API point of contact. This is why it is not always that easy.

To over simplify things, in the process of running a docker, there are two main calls, one that creates the container, and one that starts it, each ingest different pieces of information, and knowing which is while takes someone having API knowledge, which I don't know like I we tend to know the docker CLI. I do not think being able to add extra args to docker_py calls is going to be as useful as you think, except in select use cases.

To make things even more difficult, sometimes the docker_py library is behind the API, and doesn't have everything you need right away either, and you have to wait for it to be updated. All that being said, extra_docker_run_args isn't a simple solution.

@andyneff Thanks for your explanation. Indeed, I'm not too familiar with the inner workings of Docker. If I understand correctly, there are 4 APIs that need to be manually synced for any new feature updates:

  1. Docker socket API
  2. docker_py that provides python frontend to the socket API
  3. Docker CLI (our familiar entry point to docker toolchain)
  4. Docker-compose interface that calls docker socket API

This begs the question: why is there no automatic (or at least semi-automatic) syncing mechanism? Manually propagating new feature updates across 4 APIs seems doomed to be error-prone, delay-prone, and confusing ...

P.S. I'm not saying that it's a simple task to have automatic syncing, but I really think there should be one to make life easier in the future.

I'm kinda getting into pedantics now... But as I would describe it as...

  • The docker socket is THE official API for docker. It is often a file socket, but can also be TCP (or any other, I imagine using socat)
  • The docker CLI uses that API to give us users an awesome tool
    • Docker writes the API and CLI, so they are are always synced at release time. (I think that's safe to say, the CLI is a first class citizen of the docker ecosystem)
  • The docker_py library, takes that API and puts it in an awesome library that other python libraries can use. Without this you would be making all these HTTP calls yourself, and pulling your hair out.
    • However docker_py was started as a third party library, and thus it has traditionally trailed the docker API, and has things added later or as needed (limited resources).
  • compose uses a version of the docker_py and then add all these awesome features, again as needed (based on issues just like this)
    • However, compose can't do much until docker_py (which I'm not saying is holding up this issue, I don't know, I'm just talking in general)

So yes, it goes:

  • "compose yaml+compose args" -> "docker_py" -> "docker_api"
  • And the CLI isn't any part of this, (and believe me, that's the right way to do things)

I can't speak for docker_py or compose, but I imagine they have limited man hours contributing to it, so it's harder to keep up with ALL the crazy insane docker features that docker is CONSTANTLY adding. But since docker is a go library, and my understanding is that python support is not (currently) a first class citizen. Although it is nice that both projects are under the docker umbrella, at least from a github organization stand point.


So that all being said... I too am waiting for an equivalent --gpus support, and have to use the old runtime: nvidia method instead, which will at least give me "a" path to move forward in docker-compose 2.x.