Support for NVIDIA GPUs under Docker Compose

Question

Support for NVIDIA GPUs under Docker Compose

collabnix opened this issue 5 years ago · comments

Under Docker 19.03.0 Beta 2, support for NVIDIA GPU has been introduced in the form of new CLI API --gpus. docker/cli#1714 talk about this enablement.

Now one can simply pass --gpus option for GPU-accelerated Docker based application.

$ docker run -it --rm --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
f476d66f5408: Pull complete 
8882c27f669e: Pull complete 
d9af21273955: Pull complete 
f5029279ec12: Pull complete 
Digest: sha256:d26d529daa4d8567167181d9d569f2a85da3c5ecaf539cace2c6223355d69981
Status: Downloaded newer image for ubuntu:latest
Tue May  7 15:52:15 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    22W /  75W |      0MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
:~$

As of today, Compose doesn't support this. This is a feature request for enabling Compose to support for NVIDIA GPU.

YY commented 5 years ago

Any eta?

Nicholas Quentin Haas · Answer 1 · Wed Jul 24 2019 08:17:07 GMT+0800 (China Standard Time)

This is of increased importance now that the (now) legacy 'nvidia runtime' appears broken with Docker 19.03.0 and nvidia-container-toolkit-1.0.0-2: NVIDIA/nvidia-docker#1017

$ cat docker-compose.yml 
version: '2.3'

services:
 nvidia-smi-test:
  runtime: nvidia
  image: nvidia/cuda:9.2-runtime-centos7

$ docker-compose run nvidia-smi-test
Cannot create container for service nvidia-smi-test: Unknown runtime specified nvidia

This works: docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

This does not: docker run --runtime=nvidia nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

Michael Nordmeyer · Answer 2 · Wed Jul 24 2019 17:14:33 GMT+0800 (China Standard Time)

Any work happening on this?

I got the new Docker CE 19.03.0 on a new Ubuntu 18.04 LTS machine, have the current and matching NVIDIA Container Toolkit (née nvidia-docker2) version, but cannot use it because docker-compose.yml 3.7 doesn't support the --gpus flag.

Alessandro Re · Answer 3 · Wed Jul 24 2019 20:13:22 GMT+0800 (China Standard Time)

Is there a workaround for this?

Kien Dang · Answer 4 · Sun Jul 28 2019 15:59:04 GMT+0800 (China Standard Time)

This works: docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

This does not: docker run --runtime=nvidia nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

You need to have

{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

in your /etc/docker/daemon.json for --runtime=nvidia to continue working. More info here.

Łukasz Piłatowski · Answer 5 · Sat Aug 10 2019 03:49:08 GMT+0800 (China Standard Time)

ping @KlaasH @ulyssessouza @Goryudyuma @chris-crone . Any update on this?

ibrahim ethem demirci · Answer 6 · Tue Aug 13 2019 22:42:54 GMT+0800 (China Standard Time)

It is an urgent need. Thank you for your effort!

Daniel Speck · Answer 7 · Fri Aug 16 2019 23:21:45 GMT+0800 (China Standard Time)

Is it intended to have user manually populate /etc/docker/daemon.json after migrating to docker >= 19.03 and removing nvidia-docker2 to use nvidia-container-toolkit instead?

It seems that this breaks a lot of installations. Especially, since --gpus is not available in compose.

Andy Neff · Answer 8 · Fri Aug 16 2019 23:31:48 GMT+0800 (China Standard Time)

No, this is a work around for until compose does support the gpus flag.

uderik · Answer 9 · Tue Aug 27 2019 18:42:32 GMT+0800 (China Standard Time)

install nvidia-docker-runtime:
https://github.com/NVIDIA/nvidia-container-runtime#docker-engine-setup
add to /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}

docker-compose:
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all

Uladzimir Kazakevich · Answer 10 · Tue Aug 27 2019 22:45:02 GMT+0800 (China Standard Time)

There is no such thing like "/usr/bin/nvidia-container-runtime" anymore. Issue is still critical.

uderik · Answer 11 · Tue Aug 27 2019 22:49:58 GMT+0800 (China Standard Time)

it will help run nvidia environment with docker-compose, untill fix docker-compose

Jose Ruiz · Answer 12 · Wed Aug 28 2019 01:35:24 GMT+0800 (China Standard Time)

install nvidia-docker-runtime:
https://github.com/NVIDIA/nvidia-container-runtime#docker-engine-setup
add to /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}

docker-compose:
runtime: nvidia
environment:

NVIDIA_VISIBLE_DEVICES=all

This is not working for me, still getting the Unsupported config option for services.myservice: 'runtime' when trying to run docker-compose up

any ideas?

uderik · Answer 13 · Wed Aug 28 2019 01:38:40 GMT+0800 (China Standard Time)

This is not working for me, still getting the Unsupported config option for services.myservice: 'runtime' when trying to run docker-compose up

any ideas?

after modify /etc/docker/daemon.json, restart docker service
systemctl restart docker
use Compose format 2.3 and add runtime: nvidia to your GPU service. Docker Compose must be version 1.19.0 or higher.
docker-compose file:
version: '2.3'

services:
nvsmi:
image: ubuntu:16.04
runtime: nvidia
environment:
- NVIDIA_VISIBLE_DEVICES=all
command: nvidia-smi

Uladzimir Kazakevich · Answer 14 · Wed Aug 28 2019 01:52:02 GMT+0800 (China Standard Time)

@cheperuiz, you can set nvidia as default runtime in daemon.json and will not be dependent on docker-compose. But all you docker containers will use nvidia runtime - I have no issues so far.
{ "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } }, }

Jose Ruiz · Answer 15 · Wed Aug 28 2019 02:45:12 GMT+0800 (China Standard Time)

Ah! thank you @Kwull , i missed that default-runtime part... Everything working now :)

John Colby · Answer 16 · Wed Aug 28 2019 15:17:17 GMT+0800 (China Standard Time)

@uderik, runtime is no longer present in the current 3.7 compose file format schema, nor in the pending 3.8 version that should eventually align with Docker 19.03: https://github.com/docker/compose/blob/5e587d574a94e011b029c2fb491fb0f4bdeef71c/compose/config/config_schema_v3.8.json

Andy Neff · Answer 17 · Wed Aug 28 2019 21:58:10 GMT+0800 (China Standard Time)

@johncolby runtime has never been a 3.x flag. It's only present in the 2.x track, (2.3 and 2.4).

Jose Ruiz · Answer 18 · Wed Aug 28 2019 22:42:15 GMT+0800 (China Standard Time)

Yeah, I know, and even though my docker-compose.yml file includes the version: '2.3' (which have worked in the past) it seems to be ignored by the latest versions...
For future projects, what would be the correct way to enable/disable access to the GPU? just making it default + env variables? or will there be support for the --gpus flag?

Daniel Speck · Answer 19 · Fri Aug 30 2019 22:39:09 GMT+0800 (China Standard Time)

@johncolby what is the replacement for runtime in 3.X?

John Colby · Answer 20 · Fri Aug 30 2019 23:37:12 GMT+0800 (China Standard Time)

@Daniel451 I've just been following along peripherally, but it looks like it will be under the generic_resources key, something like:

services:
  my_app:
    deploy:
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
                kind: 'gpu'
                value: 2

(from https://github.com/docker/cli/blob/9a39a1/cli/compose/loader/full-example.yml#L71-L74)
Design document here: https://github.com/docker/swarmkit/blob/master/design/generic_resources.md

Here is the compose issue regarding compose 3.8 schema support, which is already merged in: #6530

On the daemon side the gpu capability can get registered by including it in the daemon.json or dockerd CLI (like the previous hard-coded runtime workaround), something like

/usr/bin/dockerd --node-generic-resource gpu=2

which then gets registered by hooking into the NVIDIA docker utility:
https://github.com/moby/moby/blob/09d0f9/daemon/nvidia_linux.go

It looks like the machinery is basically in place, probably just needs to get documented...

Chongyi Zheng · Answer 21 · Sun Sep 01 2019 19:14:22 GMT+0800 (China Standard Time)

Any update?

Ryan Li · Answer 22 · Fri Sep 06 2019 04:57:51 GMT+0800 (China Standard Time)

Also waiting on updates, using bash with docker run --gpus until the official fix...

Can Elbirlik · Answer 23 · Mon Sep 09 2019 16:07:37 GMT+0800 (China Standard Time)

Waiting for updates asw ell.

Tan Li · Answer 24 · Sat Sep 14 2019 09:22:39 GMT+0800 (China Standard Time)

Also waiting for updates :)

Curtis Ruck · Answer 25 · Mon Sep 16 2019 07:11:49 GMT+0800 (China Standard Time)

Ok... I don't understand why this is still open. These 3 additional lines make it work with schema version 3.7. Glad to know docker is responsive to trivial community issues. So clone this repo, make add these three lines, and python3 setup.py build && install it, and make sure your docker-compose.yml is version 3.7.

[ruckc@omnilap compose]$ git diff
diff --git a/compose/config/config_schema_v3.7.json b/compose/config/config_schema_v3.7.json
index cd7882f5..d25d404c 100644
--- a/compose/config/config_schema_v3.7.json
+++ b/compose/config/config_schema_v3.7.json
@@ -151,6 +151,7 @@

         "external_links": {"type": "array", "items": {"type": "string"}, "uniqueItems": true},
         "extra_hosts": {"$ref": "#/definitions/list_or_dict"},
+        "gpus": {"type": ["number", "string"]},
         "healthcheck": {"$ref": "#/definitions/healthcheck"},
         "hostname": {"type": "string"},
         "image": {"type": "string"},
diff --git a/compose/service.py b/compose/service.py
index 55d2e9cd..71188b67 100644
--- a/compose/service.py
+++ b/compose/service.py
@@ -89,6 +89,7 @@ HOST_CONFIG_KEYS = [
     'dns_opt',
     'env_file',
     'extra_hosts',
+    'gpus',
     'group_add',
     'init',
     'ipc',
@@ -996,6 +997,7 @@ class Service(object):
             dns_opt=options.get('dns_opt'),
             dns_search=options.get('dns_search'),
             restart_policy=options.get('restart'),
+            gpus=options.get('gpus'),
             runtime=options.get('runtime'),
             cap_add=options.get('cap_add'),
             cap_drop=options.get('cap_drop'),

Ulysses Souza · Answer 26 · Mon Sep 16 2019 21:51:29 GMT+0800 (China Standard Time)

I just added an internal issue to track that.
Remember that PRs are welcome 😃

DarioTurchi · Answer 27 · Mon Sep 16 2019 22:26:09 GMT+0800 (China Standard Time)

Ok... I don't understand why this is still open. These 3 additional lines make it work with schema version 3.7. Glad to know docker is responsive to trivial community issues. So clone this repo, make add these three lines, and python3 setup.py build && install it, and make sure your docker-compose.yml is version 3.7.

[ruckc@omnilap compose]$ git diff
diff --git a/compose/config/config_schema_v3.7.json b/compose/config/config_schema_v3.7.json
index cd7882f5..d25d404c 100644
--- a/compose/config/config_schema_v3.7.json
+++ b/compose/config/config_schema_v3.7.json
@@ -151,6 +151,7 @@

         "external_links": {"type": "array", "items": {"type": "string"}, "uniqueItems": true},
         "extra_hosts": {"$ref": "#/definitions/list_or_dict"},
+        "gpus": {"type": ["number", "string"]},
         "healthcheck": {"$ref": "#/definitions/healthcheck"},
         "hostname": {"type": "string"},
         "image": {"type": "string"},
diff --git a/compose/service.py b/compose/service.py
index 55d2e9cd..71188b67 100644
--- a/compose/service.py
+++ b/compose/service.py
@@ -89,6 +89,7 @@ HOST_CONFIG_KEYS = [
     'dns_opt',
     'env_file',
     'extra_hosts',
+    'gpus',
     'group_add',
     'init',
     'ipc',
@@ -996,6 +997,7 @@ class Service(object):
             dns_opt=options.get('dns_opt'),
             dns_search=options.get('dns_search'),
             restart_policy=options.get('restart'),
+            gpus=options.get('gpus'),
             runtime=options.get('runtime'),
             cap_add=options.get('cap_add'),
             cap_drop=options.get('cap_drop'),

i tried your solution but I get a lot of errors about that flag:

ERROR: for <SERVICE_NAME>  __init__() got an unexpected keyword argument 'gpus'
Traceback (most recent call last):
  File "/usr/local/bin/docker-compose", line 11, in <module>
    load_entry_point('docker-compose==1.25.0.dev0', 'console_scripts', 'docker-compose')()
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 71, in main
    command()
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 127, in perform_command
    handler(command, command_options)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 1106, in up
    to_attach = up(False)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/cli/main.py", line 1102, in up
    cli=native_builder,
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/project.py", line 569, in up
    get_deps,
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 112, in parallel_execute
    raise error_to_reraise
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 210, in producer
    result = func(obj)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/project.py", line 555, in do
    renew_anonymous_volumes=renew_anonymous_volumes,
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 546, in execute_convergence_plan
    scale, detached, start
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 468, in _execute_convergence_create
    "Creating"
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 112, in parallel_execute
    raise error_to_reraise
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/parallel.py", line 210, in producer
    result = func(obj)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 466, in <lambda>
    lambda service_name: create_and_start(self, service_name.number),
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 454, in create_and_start
    container = service.create_container(number=n, quiet=True)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 337, in create_container
    previous_container=previous_container,
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 913, in _get_container_create_options
    one_off=one_off)
  File "/usr/local/lib/python3.6/dist-packages/docker_compose-1.25.0.dev0-py3.6.egg/compose/service.py", line 1045, in _get_container_host_config
    cpu_rt_runtime=options.get('cpu_rt_runtime'),
  File "/usr/local/lib/python3.6/dist-packages/docker-4.0.2-py3.6.egg/docker/api/container.py", line 590, in create_host_config
    return HostConfig(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'gpus'

Do I need a specific python docker package ?

Tan Li · Answer 28 · Tue Sep 17 2019 05:59:56 GMT+0800 (China Standard Time)

@DarioTurchi Yeah, I met the exact issue. Seems the type of HostConfig needs to be updated also.

Andrew Johnson · Answer 29 · Wed Sep 18 2019 20:05:23 GMT+0800 (China Standard Time)

I don't believe the change described by @ruckc is sufficient, because docker-py will also need a change. And it looks like the necessary docker-py change is still being worked on. See here:
docker/docker-py#2419

Here is the branch with the changes:
https://github.com/sigurdkb/docker-py/tree/gpus_parameter

So if you wish to patch this in now you'll have to build docker-compose against a modified docker-py from https://github.com/sigurdkb/docker-py/tree/gpus_parameter

Loreto Parisi · Answer 30 · Tue Oct 01 2019 00:57:10 GMT+0800 (China Standard Time)

I don't get what is going on here:

I have in /etc/docker/daemon.json

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

but runtime key cannot be used anymore in v3.x as for #6239

I have tried also:

{
	"default-runtime": "nvidia",
	"runtimes": {
		"nvidia": {
			"path": "/usr/bin/nvidia-container-runtime",
			"runtimeArgs": []
		}
	}
}

So I cannot start my containers with gpu support on docker-compose anymore:

bertserving_1    | I:VENTILATOR:[__i:_ge:222]:get devices
bertserving_1    | W:VENTILATOR:[__i:_ge:246]:no GPU available, fall back to CPU

Before those changes it worked, so what can I do now?

Idris Yusupov · Answer 31 · Wed Oct 02 2019 04:21:26 GMT+0800 (China Standard Time)

+1 it will be very useful to have such feature in docker-compose!

Nicolas De loof · Answer 32 · Fri Oct 11 2019 15:09:30 GMT+0800 (China Standard Time)

internally tracked as https://docker.atlassian.net/browse/COMPOSE-82

Jackson Proudfoot · Answer 33 · Fri Oct 18 2019 23:45:12 GMT+0800 (China Standard Time)

+1 would be useful feature for docker-compose

Claudio Busatto · Answer 34 · Mon Oct 21 2019 20:22:36 GMT+0800 (China Standard Time)

This feature would be an awesome addition to docker-compose

Felipe Arruda Pontes · Answer 35 · Mon Oct 28 2019 21:17:44 GMT+0800 (China Standard Time)

Right now my solution for this is using 2.3 version of docker-compose file, that support runtime, and manually installing the nvidia-container-runtime (since it is no longer installed with the nvidia-docker).
Also I'm settings the runtime configs in the /etc/docker/daemon.json (not as default, just as an available runtime).
With this I can use a compose file as such:

version: '2.3'
services:
  test:
    image: nvidia/cuda:9.0-base
    runtime: nvidia

Jose Ruiz · Answer 36 · Tue Oct 29 2019 01:25:53 GMT+0800 (China Standard Time)

Right now my solution for this is using 2.3 version of docker-compose file, that support runtime, and manually installing the nvidia-container-runtime (since it is no longer installed with the nvidia-docker).
Also I'm settings the runtime configs in the /etc/docker/daemon.json (not as default, just as an available runtime).
With this I can use a compose file as such:
version: '2.3'
services:
  test:
    image: nvidia/cuda:9.0-base
    runtime: nvidia

@arruda Would you mind sharing your daemon.json please?

Felipe Arruda Pontes · Answer 37 · Tue Oct 29 2019 01:42:20 GMT+0800 (China Standard Time)

Right now my solution for this is using 2.3 version of docker-compose file, that support runtime, and manually installing the nvidia-container-runtime (since it is no longer installed with the nvidia-docker).
Also I'm settings the runtime configs in the /etc/docker/daemon.json (not as default, just as an available runtime).
With this I can use a compose file as such:
version: '2.3'
services:
  test:
    image: nvidia/cuda:9.0-base
    runtime: nvidia
@arruda Would you mind sharing your daemon.json please?

Yeah, no problem, here it is:

{
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Rfsch · Answer 38 · Mon Nov 04 2019 19:42:12 GMT+0800 (China Standard Time)

Hi

I have an application which requires NVIDIA drivers. I have built a docker image based on (FROM)
nvidia/cudagl:10.1-runtime-ubuntu18.04

Using the approach recommended above - does it mean my image does not need to be derived from nvidia/cudagl:10.1-runtime-ubuntu18.04 ? I.e. I could simply derive from (FROM) python:3.7.3-stretch
and add runtime: nvidia to the service in docker-compose ?

Thanks

Murukesh Mohanan · Answer 39 · Tue Nov 05 2019 10:12:16 GMT+0800 (China Standard Time)

@rfsch No, that's a different thing. runtime: nvidia in docker-compose refers to the Docker runtime. This makes the GPU available to the container. But you still need some way to use them once they're made available. runtime in nvidia/cudagl:10.1-runtime-ubuntu18.04 refers to the CUDA runtime components. This lets you use the GPUs (made available in a container by Docker) using CUDA.

In this image:

runtime: nvidia replaces the runc/containerd part. nvidia/cudagl:10.1-runtime-ubuntu18.04 is completely outside the picture.

George Fedoseev · Answer 40 · Wed Nov 13 2019 15:21:30 GMT+0800 (China Standard Time)

we need this feature

david-gwa · Answer 41 · Tue Nov 19 2019 09:43:05 GMT+0800 (China Standard Time)

@Daniel451 I've just been following along peripherally, but it looks like it will be under the generic_resources key, something like:
services:
  my_app:
    deploy:
      resources:
        reservations:
          generic_resources:
            - discrete_resource_spec:
                kind: 'gpu'
                value: 2
(from https://github.com/docker/cli/blob/9a39a1/cli/compose/loader/full-example.yml#L71-L74)
Design document here: https://github.com/docker/swarmkit/blob/master/design/generic_resources.md

Here is the compose issue regarding compose 3.8 schema support, which is already merged in: #6530

On the daemon side the gpu capability can get registered by including it in the daemon.json or dockerd CLI (like the previous hard-coded runtime workaround), something like
/usr/bin/dockerd --node-generic-resource gpu=2
which then gets registered by hooking into the NVIDIA docker utility:
https://github.com/moby/moby/blob/09d0f9/daemon/nvidia_linux.go

It looks like the machinery is basically in place, probably just needs to get documented...

Hey, @johncolby, I tried this, but failed:

ERROR: The Compose file './docker-compose.yml' is invalid because:
services.nvidia-smi-test.deploy.resources.reservations value Additional properties are not allowed ('generic_resources' was unexpected)

any suggestions?

Thanks
David

jdr-face · Answer 42 · Tue Nov 19 2019 16:37:00 GMT+0800 (China Standard Time)

Installing nvidia-container-runtime 3.1.4.1 from https://github.com/NVIDIA/nvidia-container-runtime and putting

runtime: nvidia

works fine here with docker-compose 1.23.1 and 1.24.1 as installed from https://docs.docker.com/compose/install/ using this dodgy looking command:

sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose

and e.g. the nvidia/cudagl/10.1-base container from dockerhub. I've tried cuda and OpenGL rendering and it's all near native performance.

Nicolas De loof · Answer 43 · Tue Nov 19 2019 16:58:31 GMT+0800 (China Standard Time)

Internally tracked as COMPOSE-82
Please note that such a change need also to be implemented in docker stack (https://github.com/docker/cli/blob/master/cli/compose/types/types.go#L156) for consistency

david-gwa · Answer 44 · Wed Nov 20 2019 08:35:51 GMT+0800 (China Standard Time)

Installing nvidia-container-runtime 3.1.4.1 from https://github.com/NVIDIA/nvidia-container-runtime and putting
runtime: nvidia
works fine here with docker-compose 1.23.1 and 1.24.1 as installed from https://docs.docker.com/compose/install/ using this dodgy looking command:
sudo curl -L "https://github.com/docker/compose/releases/download/1.24.1/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
and e.g. the nvidia/cudagl/10.1-base container from dockerhub. I've tried cuda and OpenGL rendering and it's all near native performance.

can you share your docker-compose.yml ?

hey, @jdr-face,

here is my test following your suggestion, by install nvidia-container-runtime at host machine.

version: '3.0'

services:
  nvidia-smi-test:
    runtime: nvidia
    volumes:
      - /tmp/.X11-unix:/tmp/.X11-unix 
    environment:
     - NVIDIA_VISIBLE_DEVICES=0 
     - DISPLAY
    image: vkcube

it still give the error:

       Unsupported config option for services.nvidia-smi-test: 'runtime'

Murukesh Mohanan · Answer 45 · Wed Nov 20 2019 15:58:48 GMT+0800 (China Standard Time)

@david-gwa as noted by andyneff earlier:

runtime has never been a 3.x flag. It's only present in the 2.x track, (2.3 and 2.4).

jdr-face · Answer 46 · Wed Nov 20 2019 17:08:34 GMT+0800 (China Standard Time)

@david-gwa

can you share your docker-compose.yml ?

version: '2.3'

services:
    container:
        image: "nvidia/cudagl/10.1-base"

        runtime: "nvidia" 

        security_opt:
            - seccomp:unconfined
        privileged: true

        volumes:
            - $HOME/.Xauthority:/root/.Xauthority:rw
            - /tmp/.X11-unix:/tmp/.X11-unix:rw
        
        environment:
          - NVIDIA_VISIBLE_DEVICES=all

Depending on your needs some of those options may be unnecessary. As @muru predicted, the trick is to specify an old version. At least for my use case this isn't a problem, but I only offer this config as a workaround, really it should be made possible using the latest version.

david-gwa · Answer 47 · Wed Nov 20 2019 17:34:21 GMT+0800 (China Standard Time)

thanks guys, @jdr-face , @muru , compose v2 does work,
I mis-understood your solution is for v3 compose.

Andy Neff · Answer 48 · Thu Nov 21 2019 23:23:13 GMT+0800 (China Standard Time)

For the record, traditionally speaking: compose v2 is not older than compose v3. They are different use cases. v3 is geared towards swarm while v2 is not. v1 is old.

Shoma Kokuryo · Answer 49 · Sat Nov 30 2019 15:11:25 GMT+0800 (China Standard Time)

Is there any discussion about the support of Docker-compose for Docker's native GPU support?

Supporting runtime option is not the solution for GPU support in the future. NVIDIA describes about the future of nvidia-docker2 in https://github.com/NVIDIA/nvidia-docker as follows.

Note that with the release of Docker 19.03, usage of nvidia-docker2 packages are deprecated since NVIDIA GPUs are now natively supported as devices in the Docker runtime.

Currently, GPU support can be realized by changing the runtime, but it is highly possible that this will not work in the future.

Aging · Answer 50 · Wed Dec 04 2019 14:55:32 GMT+0800 (China Standard Time)

To be frank, this maybe not the best practise, but somehow we make it work.

The tricky part is that we have to stick with docker-compose v3.x since we are use docker swarm, meanwhile we want to use the Nvidia Runtime to support GPU/CUDA in the containers.

To avoid explicitly telling the Nvidia Runtime inside the docker-compose file, we set the Nvidia as the default runtime in /etc/docker/daemon.json, and it will looks like

{
    "default-runtime":"nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Such that all the containers running on the GPU machines will default enable the Nvidia runtime.

Hope this can help someone facing the similar blocker

Michael Green · Answer 51 · Mon Dec 09 2019 22:08:13 GMT+0800 (China Standard Time)

To be frank, this maybe not the best practise, but somehow we make it work.

The tricky part is that we have to stick with docker-compose v3.x since we are use docker swarm, meanwhile we want to use the Nvidia Runtime to support GPU/CUDA in the containers.

To avoid explicitly telling the Nvidia Runtime inside the docker-compose file, we set the Nvidia as the default runtime in /etc/docker/daemon.json, and it will looks like
{
    "default-runtime":"nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
Such that all the containers running on the GPU machines will default enable the Nvidia runtime.

Hope this can help someone facing the similar blocker

This is indeed what we do as well. It works for now, but it feels a little hacky to me. Hoping for full compose-v3 support soon. :)

opptimus · Answer 52 · Thu Dec 19 2019 14:49:43 GMT+0800 (China Standard Time)

Is it intended to have user manually populate /etc/docker/daemon.json after migrating to docker >= 19.03 and removing nvidia-docker2 to use nvidia-container-toolkit instead?

It seems that this breaks a lot of installations. Especially, since --gpus is not available in compose.

--gpus is not available in compose
I can not use pycharm to link docker to run tensorflow-gpu

Ivan Ralašić · Answer 53 · Mon Jan 06 2020 18:18:28 GMT+0800 (China Standard Time)

Any updates on this issue? Is there a chance that the --gpus will be supported in docker-compose soon?

Yoanis Gil Delgado · Answer 54 · Tue Jan 07 2020 04:53:21 GMT+0800 (China Standard Time)

For those of you looking for a workaround this what we ended up doing:

Install docker-py from this PR: docker/docker-py#2471
Install docker-compose from this PR: #7124

And then run COMPOSE_API_VERSION=auto docker-compose run gpu with the following file:

version: '3.7'

services:
    gpu:
        image: 'nvidia/cuda:9.0-base'
        command: 'nvidia-smi'
        device_requests:
            - capabilities:
               - "gpu"

mhLi · Answer 55 · Mon Jan 20 2020 18:17:18 GMT+0800 (China Standard Time)

Under Docker 19.03.0 Beta 2, support for NVIDIA GPU has been introduced in the form of new CLI API --gpus. docker/cli#1714 talk about this enablement.

Now one can simply pass --gpus option for GPU-accelerated Docker based application.

$ docker run -it --rm --gpus all ubuntu nvidia-smi
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
f476d66f5408: Pull complete 
8882c27f669e: Pull complete 
d9af21273955: Pull complete 
f5029279ec12: Pull complete 
Digest: sha256:d26d529daa4d8567167181d9d569f2a85da3c5ecaf539cace2c6223355d69981
Status: Downloaded newer image for ubuntu:latest
Tue May  7 15:52:15 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.116                Driver Version: 390.116                   |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   39C    P0    22W /  75W |      0MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
:~$

As of today, Compose doesn't support this. This is a feature request for enabling Compose to support for NVIDIA GPU.

I have solved this problems，you can have a try as follows, my csdn blog address: https://blog.csdn.net/u010420283/article/details/104055046

~$ sudo apt-get install nvidia-container-runtime
~$ sudo vim /etc/docker/daemon.json

then , in this daemon.json file, add this content:

{
"default-runtime": "nvidia"
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}

~$ sudo systemctl daemon-reload
~$ sudo systemctl restart docker

Guillaume Grégoire · Answer 56 · Tue Jan 21 2020 09:30:27 GMT+0800 (China Standard Time)

For the ansible users who want to setup the workaround described before, there is a role to install nvidia-container-runtime and configure the /etc/docker/deamon.json to use runtime: nvidia:

https://github.com/NVIDIA/ansible-role-nvidia-docker

(for some reason it runs only on Ubuntu and RHEL, but it's quite easy to modify. I run it on Debian)

Then in your docker-compose.yml:

version: "2.4"
services:
  test:
    image: "nvidia/cuda:10.2-runtime-ubuntu18.04"
    command: "nvidia-smi"

dottgonzo · Answer 57 · Sat Jan 25 2020 04:08:20 GMT+0800 (China Standard Time)

any update on official 3.x version with gpu support? We need on swarm :)

Guillem García Subies · Answer 58 · Wed Feb 12 2020 21:00:40 GMT+0800 (China Standard Time)

Is there any plan to add this feature?

Lucidiot · Answer 59 · Wed Feb 12 2020 21:15:00 GMT+0800 (China Standard Time)

This feature depends on docker-py implementing the device_requests parameters, which is what --gpus translates to. There have been multiple pull requests to add this feature (docker/docker-py#2419, docker/docker-py#2465, docker/docker-py#2471) but there are no reactions from any maintainer. #7124 uses docker/docker-py#2471 to provide it in Compose, but still no reply from anyone.

Yoanis Gil Delgado · Answer 60 · Wed Feb 12 2020 22:27:41 GMT+0800 (China Standard Time)

As I mentioned in #7124 I'm more than happy to make the PR more compliant but since it's gotten very little attention I don't want to waste my time in something that's not going to be merged ...

BruneXX · Answer 61 · Fri Feb 14 2020 00:13:25 GMT+0800 (China Standard Time)

Please add this feature, will be awesome!

Wilder Rodrigues · Answer 62 · Sat Feb 15 2020 21:50:50 GMT+0800 (China Standard Time)

Please, add this feature! I was more than happy with the old nevidia-docker2, which allowed me to change the runtime in the daemon.json. Would be extremely nice to have this back.

Sebastián Felipe · Answer 63 · Wed Feb 19 2020 05:55:40 GMT+0800 (China Standard Time)

Need it, please. Really need it :/

Dave Padovano · Answer 64 · Fri Feb 28 2020 05:14:44 GMT+0800 (China Standard Time)

I'd like to pile on as well... we need this feature!

vk1z · Answer 65 · Fri Feb 28 2020 08:29:56 GMT+0800 (China Standard Time)

I need to run both CPU and GPU containers on the same machine so the default runtime hack doesn't work for me. Do we have any idea when this will work on compose? Given that that we don't have the runtime flag in compose this represents a serious functionality regression, does it not? I'm having to write scripts in order to make this work - yuck!

dottgonzo · Answer 66 · Sat Feb 29 2020 04:32:23 GMT+0800 (China Standard Time)

I need to run both CPU and GPU containers on the same machine so the default runtime hack doesn't work for me. Do we have any idea when this will work on compose? Given that that we don't have the runtime flag in compose this represents a serious functionality regression, does it not? I'm having to write scripts in order to make this work - yuck!

you can do it by docker cli (docker run --gpu ....), i have this kind of trick (by adding a proxy, to be able to communicato with other containers running on other nodes on swarm). We are all waiting for the ability to run it on swarm, because it don't work by docker service command (as i know) nor by compose.

vk1z · Answer 67 · Sat Feb 29 2020 05:00:47 GMT+0800 (China Standard Time)

@dottgonzo . Well, yes ;-). I am aware of this and hence the reference to scripts. But this is a pretty awful and non-portable way of doing it so I'd like to do it in a more dynamic way. As I said, I think that this represents a regression, not a feature ask.

Code Vampire · Answer 68 · Mon Mar 16 2020 12:03:37 GMT+0800 (China Standard Time)

COMPOSE_API_VERSION=auto docker-compose run gpu

@ggregoire where do we run: COMPOSE_API_VERSION=auto docker-compose run gpu ?

Yoanis Gil Delgado · Answer 69 · Mon Mar 16 2020 21:38:13 GMT+0800 (China Standard Time)

@joehoeller from your shell just was you would do for any other command.

Mithrandir2k18 · Answer 70 · Thu Apr 09 2020 15:19:05 GMT+0800 (China Standard Time)

Right now we are deciding for every project if we need 3.x features or if we can use docker-compose 2.x where the GPU option is still supported. Features like running multistage targets from a Dockerfile can sadly not be used if GPU is necessary. Please add this back in!

I'd like to recommend something like an "additional options" field for docker-compose where we can just add flags like --gpus=all to the docker start/run command, that are not yet/anymore supported in docker-compose but are in the latest docker version. This way, compose users won't have to wait for docker-compose to catch up if they need a new not yet supported docker feature.

Sebastián Felipe · Answer 71 · Fri Apr 10 2020 00:41:02 GMT+0800 (China Standard Time)

Is still necessary to run this on Docker Swarm for production environments. Will this be useful por Docker Swarm?

Mithrandir2k18 · Answer 72 · Fri Apr 10 2020 15:49:19 GMT+0800 (China Standard Time)

@sebastianfelipe It's very useful if you want to deploy to your swarm using compose.
Compare:
docker service create --generic-resource "gpu=1" --replicas 10 \ --name sparkWorker <image_name> \"service ssh start && \ /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://<spark_master_ip>:7077\"

to something like this

docker stack deploy --compose-file docker-compose.yml stackdemo

Sebastián Felipe · Answer 73 · Fri Apr 10 2020 23:26:19 GMT+0800 (China Standard Time)

@sebastianfelipe It's very useful if you want to deploy to your swarm using compose.
Compare:
docker service create --generic-resource "gpu=1" --replicas 10 \ --name sparkWorker <image_name> \"service ssh start && \ /opt/spark/bin/spark-class org.apache.spark.deploy.worker.Worker spark://<spark_master_ip>:7077\"

to something like this

docker stack deploy --compose-file docker-compose.yml stackdemo

Sorry, so is it already working with Docker Swarm using the docker-compose yaml file? Just to be sure :O. Thanks!

Mithrandir2k18 · Answer 74 · Sun Apr 12 2020 02:43:24 GMT+0800 (China Standard Time)

only for docker compose 2.x

The entire point of this issue is to request nvidia-docker gpu support for docker-compose 3+

Miria Ford · Answer 75 · Tue Apr 21 2020 13:36:41 GMT+0800 (China Standard Time)

It's been almost a year since the original request!! Why the delay?? Can we move this forward ??

Łukasz Piłatowski · Answer 76 · Tue Apr 21 2020 14:32:27 GMT+0800 (China Standard Time)

ping @KlaasH @ulyssessouza @Goryudyuma @chris-crone . Any update on this?

Miria Ford · Answer 77 · Tue Apr 21 2020 14:38:28 GMT+0800 (China Standard Time)

For those of you looking for a workaround this what we ended up doing:

Install docker-py from this PR: docker/docker-py#2471

Install docker-compose from this PR: #7124

And then run COMPOSE_API_VERSION=auto docker-compose run gpu with the following file:
version: '3.7'

services:
    gpu:
        image: 'nvidia/cuda:9.0-base'
        command: 'nvidia-smi'
        device_requests:
            - capabilities:
               - "gpu"

For those of you who are as impatient as I am, here's an easy pip install version of the above workaround:

pip install git+https://github.com/docker/docker-py.git@refs/pull/2471/merge
pip install git+https://github.com/docker/compose.git@refs/pull/7124/merge
pip install python-dotenv

Huge kudos to @yoanisgil !
Still anxiously waiting for an official patch. With all the PRs in place, it doesn't seem difficult by any standard.

Goryudyuma(Kei.Matsumoto) · Answer 78 · Tue Apr 21 2020 14:44:54 GMT+0800 (China Standard Time)

ping @KlaasH @ulyssessouza @Goryudyuma @chris-crone . Any update on this?

No, I don't know why I was called.
I want you to tell me what to do?

Ugurkan Ates · Answer 79 · Sun May 03 2020 01:22:56 GMT+0800 (China Standard Time)

I hope there is an update on this.

Antoine Viallon · Answer 80 · Mon May 04 2020 08:06:03 GMT+0800 (China Standard Time)

Yeah, it's been more than a year now... why are they not merging in docker-py...

Chris Crone · Answer 81 · Mon May 04 2020 17:23:14 GMT+0800 (China Standard Time)

I'm not sure that the proposed implementations are the right ones for the Compose format. The good news is that we've opened up the Compose format specification with the intention of adding things like this. You can find the spec at https://github.com/compose-spec.

What I'd suggest we do is add an issue on the spec and then discuss it at one of the upcoming Compose community meetings (link to invite at the bottom of this page).

deniswal · Answer 82 · Mon May 04 2020 20:40:10 GMT+0800 (China Standard Time)

This works: docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi
This does not: docker run --runtime=nvidia nvidia/cudagl:9.2-runtime-centos7 nvidia-smi

You need to have
{
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
in your /etc/docker/daemon.json for --runtime=nvidia to continue working. More info here.

Dockerd doesn't start with this daemon.json

Deleted user · Answer 83 · Tue May 05 2020 09:14:47 GMT+0800 (China Standard Time)

Christ, this is going to take years :@

vk1z · Answer 84 · Tue May 05 2020 11:07:59 GMT+0800 (China Standard Time)

This works: docker run --gpus all nvidia/cudagl:9.2-runtime-centos7 nvidia-smi
@deniswal : Yes, we know this, but we are asking about compose functionality.

@chris-crone: I'm confused: This represents a regression from former behavior, why does it need a new feature specification? Isn't it reasonable to run containers, some of which use GPU and some of which use CPU on the same physical box?

Thanks for the consideration.

Chris Crone · Answer 85 · Tue May 05 2020 19:34:17 GMT+0800 (China Standard Time)

@vk1z AFAIK Docker Compose has never had GPU support so this is not a regression. The part that needs design is how to declare a service's need for a GPU (or other device) in the Compose format– specifically changes like this. After that, it should just be plumbing to the backend.

BruneXX · Answer 86 · Tue May 05 2020 21:13:16 GMT+0800 (China Standard Time)

Hi Guys, I've tried some solutions proposed here and nothing worked to me, for example @miriaford do not worked in my case, also is there some way to use GPU to run my existent docker containers?
I've an i7 with 16GB of ram but the build for some projects takes too long to complete, my goal is to also use GPU power to speed up the process, is that possible? Thanks!

vk1z · Answer 87 · Tue May 05 2020 21:27:11 GMT+0800 (China Standard Time)

@chris-crone : Again, I will be willing to be corrected, but wasn't that because the runtime: parameter disappeared from compose after 2.4 config? That is why I felt that it was a regression. But no, matter now since we all should be on 3.x anyway.

I'd be glad to file an issue, do we do that against the spec in the spec repo, correct?

Tobias Pfeiffer · Answer 88 · Tue May 05 2020 21:33:48 GMT+0800 (China Standard Time)

but wasn't that because the runtime: parameter disappeared from compose after 2.4 config? That is why I felt that it was a regression.

Yes, exactly. I have a couple of projects where we rely on using runtime: nvidia in our docker-compose files, and this issue blocks us from upgrading to 3.x because we haven't found a way to use GPUs there.

Motophan · Answer 89 · Wed May 06 2020 18:07:36 GMT+0800 (China Standard Time)

Hi, please, please, please fix this.
This should be marked mission critical priority -20

Chris Crone · Answer 90 · Wed May 06 2020 19:46:40 GMT+0800 (China Standard Time)

Again, I will be willing to be corrected, but wasn't that because the runtime: parameter disappeared from compose after 2.4 config? That is why I felt that it was a regression. But no, matter now since we all should be on 3.x anyway.

I wasn't here when the change was made so I'm not 100 % sure why it was dropped. I know that you do not need the NVIDIA runtime to use GPUs any more and that we are evolving the Compose v3 spec in the open here with the intention of making a single version of the spec. This may mean moving some v2 functionality into v3.

In terms of the runtime field, I don't think this is how it should be added to the Compose spec as it is very specific to running on a single node. Ideally we'd want something that'd allow you to specify that your workload has a device need (e.g.: GPU, TPU, whatever comes next) and then let the orchestrator assign the workload to a node that provides that capability.

This discussion should be had on the specification though as it's not Python Docker Compose specific.

vk1z · Answer 91 · Wed May 06 2020 22:04:04 GMT+0800 (China Standard Time)

@chris-crone: I mostly concur with your statement. Adding short term hacks is probably the incorrect way to do this since we have a proliferation of edge devices each with their own runtimes. For example, as you point out, TPU (Google), VPU(Intel) and ARM GPU on the Pi. So we do need a more complete story.

I'll file an issue against the specification today and update this thread once I have done so. However, I do think that the orchestrator should be independent - such as if I want to use Kube, I should be able to do so. I'm assuming that will be in scope.

I do however, disagree with the using GPUs statement, since that doesn't work with compose - which is what this is all about. But I think we all understand what problem we would like solved.

vk1z · Answer 92 · Wed May 06 2020 22:18:20 GMT+0800 (China Standard Time)

@chris-crone : Please see the docker-compose spec issue filed. I'll follow updates against that issue from now on.

Miria Ford · Answer 93 · Thu May 07 2020 04:08:36 GMT+0800 (China Standard Time)

Can we simply add an option (something like extra_docker_run_args) to pass arguments directly to the underlying docker run? This will not only solve the current problem, but also be future-proof: what if docker adds support for whatever "XPU", "YPU", or any other new features that might come in the future?

If we need a long back-and-forth discussion every time docker adds a new feature, it will be extremely inefficient and cause inevitable delay (and unnecessary confusion) between docker-compose and docker updates. Supporting argument delegation can provide temporary relief for this recurrent issue for all future features.

vk1z · Answer 94 · Thu May 07 2020 04:54:06 GMT+0800 (China Standard Time)

@miriaford I'm not sure that passing an uninterpreted blob supports the compose notion of being declarative. The old runtime tag at least indicated that it was something to do with the runtime. Given the direction in which docker is trending (docker-apps), it seems to me that doing this would make declarative deployment harder since an orchestrator would have to parse arbitrary blobs.

But I agree that compose and docker should be synchronized and zapping working features that people depend on (even though it was a major release) isn't quite kosher.

Miria Ford · Answer 95 · Thu May 07 2020 05:07:06 GMT+0800 (China Standard Time)

@vk1z I agree - there should be a much better sync mechanism between compose and docker. However, I don't expect such mechanism to be designed any time soon. Meanwhile we also need a temporary way to do our own synchronization without hacking deep into the source code.

If the argument delegation proposal isn't an option, what do we suggest we do? I agree it isn't a pretty solution, but it's at least much better than this workaround, isn't it? #6691 (comment)

Andy Neff · Answer 96 · Thu May 07 2020 05:15:42 GMT+0800 (China Standard Time)

@miriaford docker-compose does not call the docker executive with argument, it actually uses the docker_py which uses the http API to the docker daemon. So there is no "underlying docker run" command. The docker CLI is not an API, the socket connection is the API point of contact. This is why it is not always that easy.

To over simplify things, in the process of running a docker, there are two main calls, one that creates the container, and one that starts it, each ingest different pieces of information, and knowing which is while takes someone having API knowledge, which I don't know like I we tend to know the docker CLI. I do not think being able to add extra args to docker_py calls is going to be as useful as you think, except in select use cases.

To make things even more difficult, sometimes the docker_py library is behind the API, and doesn't have everything you need right away either, and you have to wait for it to be updated. All that being said, extra_docker_run_args isn't a simple solution.

Miria Ford · Answer 97 · Thu May 07 2020 05:23:54 GMT+0800 (China Standard Time)

@andyneff Thanks for your explanation. Indeed, I'm not too familiar with the inner workings of Docker. If I understand correctly, there are 4 APIs that need to be manually synced for any new feature updates:

Docker socket API
docker_py that provides python frontend to the socket API
Docker CLI (our familiar entry point to docker toolchain)
Docker-compose interface that calls docker socket API

This begs the question: why is there no automatic (or at least semi-automatic) syncing mechanism? Manually propagating new feature updates across 4 APIs seems doomed to be error-prone, delay-prone, and confusing ...

Miria Ford · Answer 98 · Thu May 07 2020 05:26:40 GMT+0800 (China Standard Time)

P.S. I'm not saying that it's a simple task to have automatic syncing, but I really think there should be one to make life easier in the future.

Andy Neff · Answer 99 · Thu May 07 2020 05:48:10 GMT+0800 (China Standard Time)

I'm kinda getting into pedantics now... But as I would describe it as...

The docker socket is THE official API for docker. It is often a file socket, but can also be TCP (or any other, I imagine using socat)
The docker CLI uses that API to give us users an awesome tool
- Docker writes the API and CLI, so they are are always synced at release time. (I think that's safe to say, the CLI is a first class citizen of the docker ecosystem)
The docker_py library, takes that API and puts it in an awesome library that other python libraries can use. Without this you would be making all these HTTP calls yourself, and pulling your hair out.
- However docker_py was started as a third party library, and thus it has traditionally trailed the docker API, and has things added later or as needed (limited resources).
compose uses a version of the docker_py and then add all these awesome features, again as needed (based on issues just like this)
- However, compose can't do much until docker_py (which I'm not saying is holding up this issue, I don't know, I'm just talking in general)

So yes, it goes:

"compose yaml+compose args" -> "docker_py" -> "docker_api"
And the CLI isn't any part of this, (and believe me, that's the right way to do things)

I can't speak for docker_py or compose, but I imagine they have limited man hours contributing to it, so it's harder to keep up with ALL the crazy insane docker features that docker is CONSTANTLY adding. But since docker is a go library, and my understanding is that python support is not (currently) a first class citizen. Although it is nice that both projects are under the docker umbrella, at least from a github organization stand point.

So that all being said... I too am waiting for an equivalent --gpus support, and have to use the old runtime: nvidia method instead, which will at least give me "a" path to move forward in docker-compose 2.x.