Kill `docker exec` command will not terminate the spawned process
dqminh opened this issue · comments
Whenever a process is launched via docker exec
, it seems that killing docker exec
will not terminate the process. For example:
> docker run -d --name test-exec busybox top
> docker exec -it test-exec sh
/ # # we have an exec shell now. assume pid of docker exec is 1234
> kill 1234
# docker exec process is terminated atm, but `nsenter-exec` process is still running with sh as its child
I would expect that killing docker exec -it
process will also kill the spawned process, or there should be a way to stop the spawn process similar to how docker stop
works.
My version of docker:
❯ docker version
Client version: 1.3.1-dev
Client API version: 1.16
Go version (client): go1.3.3
Git commit (client): c049949
OS/Arch (client): linux/amd64
Server version: 1.3.1-dev
Server API version: 1.16
Go version (server): go1.3.3
Git commit (server): c049949
❯ docker info
Containers: 1
Images: 681
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Dirs: 693
Execution Driver: native-0.2
Kernel Version: 3.13.0-33-generic
Operating System: Ubuntu 14.04.1 LTS
CPUs: 2
Total Memory: 1.955 GiB
Debug mode (server): true
Debug mode (client): false
Fds: 17
Goroutines: 16
EventsListeners: 0
Init Path: /home/action/bin/docker
Username: dqminh
Registry: [https://index.docker.io/v1/]
WARNING: No swap limit support
mmm, I've just followed your example, and have a mildly different result?
[sven@t440s docker]$ docker run -d -name test-exec busybox top
Warning: '-name' is deprecated, it will be replaced by '--name' soon. See usage.
0daecd23a78f05990466c9f7d1094c737771a0cc15142588bb57ebd6b7f99c5f
[sven@t440s docker]$ docker exec -it test-exec sh
/ # ps
PID USER COMMAND
1 root top
7 root sh
13 root ps
/ # kill 7
/ # ps
PID USER COMMAND
1 root top
7 root sh
14 root ps
/ # kill -9 7
[sven@t440s docker]$ docker exec -it test-exec ps aux
PID USER COMMAND
1 root top
15 root ps aux
[sven@t440s docker]$ docker version
Client version: 1.3.1
Client API version: 1.15
Go version (client): go1.3.3
Git commit (client): 4e9bbfa
OS/Arch (client): linux/amd64
Server version: 1.3.1
Server API version: 1.15
Go version (server): go1.3.3
Git commit (server): 4e9bbfa
[sven@t440s docker]$
and the non-containerised version works the same:
[sven@t440s docker]$ sh
sh-4.2$ ps
PID TTY TIME CMD
11920 pts/3 00:00:00 bash
12090 pts/3 00:00:00 sh
12091 pts/3 00:00:00 ps
sh-4.2$ kill 12090
sh-4.2$
sh-4.2$ ps
PID TTY TIME CMD
11920 pts/3 00:00:00 bash
12090 pts/3 00:00:00 sh
12092 pts/3 00:00:00 ps
sh-4.2$ kill -HUP 12090
Hangup
so for me, its works as intended (
@SvenDowideit ah, my use case is that the docker exec
process is killed from outside of the container, not the process started by docker exec
inside the container. For example, after running docker exec
, the tree will look like ( pseudo pids here to illustrate the point ) :
1024 --- docker run -d -it --name test-exec busybox top
1025 --- docker exec -it --name test-exec sh
10 --- docker -d
\ 10000 --- top
\ 10001 --- nsenter-exec --nspid 23119 --console /dev/pts/19 -- sh
\---- sh
Now if i do kill 1025
, which kill the docker exec
process, the process tree becomes:
1024 --- docker run -d -it --name test-exec busybox top
10 --- docker -d
\ 10000 --- top
\ 10001 --- nsenter-exec --nspid 23119 --console /dev/pts/19 -- sh
\---- sh
I would expect nsenter-exec
to be killed as well and/or maybe docker should expose a way to programatically stopped the exec-process from outside.
ah, good to know more info :)
ah, good to know more info :)
Yes, i should have included the process tree from the start as it's much easier to know what's going on. Should not submit an issue at 5am i guess :(
mmm, ok, so I agree - I too would expect that docker exec
would trap the kill signal and pass it on to the Docker daemon, which should then pass the signal on to the exec
'd child
I don't see much in the way of support for this in the API, http://docs.docker.com/reference/api/docker_remote_api_v1.15/#exec-create, so bug
?
Yes and I don't see where the pid for the child is (if at all) stored in the ExecConfig.
/cc @vishh
Terminating a running 'exec' session via an API has not been implemented
yet.
@proppy: Yes, the child pid is not stored as part of ExecConfig.
On Tue, Nov 11, 2014 at 10:41 PM, Johan Euphrosine <notifications@github.com
wrote:
/cc @vishh https://github.com/vishh
—
Reply to this email directly or view it on GitHub
#9098 (comment).
@vishh do you think adding support for POST /exec/:name/stop
(and maybe POST /exec/:name/kill
) make senses here ( similar to POST /containers/:name/stop
and POST /containers/:name/kill
) ? That would actually solve majority of my usecase as I mainly consume the remote API ( which makes the exec process's unique id available with POST /exec/:name/create
)
It's probably much harder to do it from the docker cli though as we don't really expose the exec's id anywhere.
Yes. A stop/kill daemon api makes sense to me. For the CLI case, I need to
see if the daemon can automatically terminate an abandoned interactive
'exec' command.
On Tue, Nov 11, 2014 at 11:14 PM, Daniel, Dao Quang Minh <
notifications@github.com> wrote:
@vishh https://github.com/vishh do you think adding support for POST
/exec/:name/stop (and maybe POST /exec/:name/kill) make senses here (
similar to POST /containers/:name/stop and POST /containers/:name/kill )
? That would actually solve majority of my usecase as I mainly consume the
remote API ( which makes the exec process's unique id available with POST
/exec/:name/create )It's probably much harder to do it from the docker cli though as we don't
really expose the exec's id anywhere.—
Reply to this email directly or view it on GitHub
#9098 (comment).
@vishh I'm not sure how we can implement this auto-terminate. Maybe we can have some list
api for exec? And make exec
jobs dependent on container, so on container deleting - all abandoned jobs deletes too.
AFAIK exec jobs should get terminated on container deletion. Is that not
the case?
On Wed, Nov 12, 2014 at 9:23 AM, Alexandr Morozov notifications@github.com
wrote:
@vishh https://github.com/vishh I'm not sure how we can implement this
auto-terminate. Maybe we can have some list api for exec? And make exec
jobs dependent on container, so on container deleting - all abandoned jobs
deletes too.—
Reply to this email directly or view it on GitHub
#9098 (comment).
Maybe we can have some list api for exec?
Perhaps add a way to see all processes related to a container? Eg
docker containers ps <containerid>
Which will include the exec process.
Good point. We should expose exec jobs belonging to a container.
On Wed, Nov 12, 2014 at 9:59 AM, Sebastiaan van Stijn <
notifications@github.com> wrote:
Maybe we can have some list api for exec?
Perhaps add a way to see all processes related to a container? Eg
docker containers ps
Which will include the exec process.
—
Reply to this email directly or view it on GitHub
#9098 (comment).
@vishh eh, I meant internal execStore
. Yeah, it is a little different, because I wanted to add method for getting exitCode
of exec
job and be sure that this job will be deleted from execStore
. (All I can imagine is pretty ugly)
+1
this also causes a go routine leak. 3 go routines are leaked whenever this happens.
I proposed additional extensions for the remote API to stop/kill exec command here #9167 . That should fix my particular use case ( programmatically managing exec commands )
The proposal doesn't include CLI changes as i'm not sure what is the appropriate interface for exposing exec sessions yet.
An alternative to killing the spawned process would be to close stdin, stdout and stderr when docker exec
is killed. In most cases, such as when a shell is being exected, the spawned process will quit when stdin is closed.
Currently, it seems that when docker exec
is killed, the spawned process still has a stdin with nobody attached to.
I don't know if closing stdin would be a better alternative to killing the spawned process.
I get this on latest:
/ # kill 3899
sh: can't kill pid 3899: No such process
/ # kill 3900
sh: can't kill pid 3900: No such process
/ #
super weirrd
Container becomes unresponsive after creating some random number of exec instances :(. Could be related to this .. +1 for ability to destroy them via the remote API.
Happen to me as well, there is any ETA for this to be fixed?
+1
@dqminh can you share How are you guys workaround this issue right now? We need someway for docker to properly kill exec sessions..
What if start any command in wrapper. Example for API usage (haven't checked so correct if I missed smth):
"/bin/bash" "-c" "trap '[ -z "$(jobs -p)" ] || kill $(jobs -p); EXIT" + yourcommand
So when you kill bash all child processes should be stopped
@garagatyi That doesn't work, because when you kill your "docker exec" (external to container ,possibly remotely) process, the process in container doesn't get any signal, docker daemon should have closed its stdin pipe.
+1
Or at least what should I be doing instead?
The following bash snippet can be used as a workaround for this issue. I basically intercept the SIGTERM to docker exec and do a manual cleanup. It is based on this: http://veithen.github.io/2014/11/16/sigterm-propagation.html
function docker_cleanup {
docker exec $IMAGE bash -c "if [ -f $PIDFILE ]; then kill -TERM -\$(cat $PIDFILE); rm $PIDFILE; fi"
}
function docker_exec {
IMAGE=$1
PIDFILE=/tmp/docker-exec-$$
shift
trap 'kill $PID; docker_cleanup $IMAGE $PIDFILE' TERM INT
docker exec $IMAGE bash -c "echo \"\$\$\" > $PIDFILE; exec $*" &
PID=$!
wait $PID
trap - TERM INT
wait $PID
}
#use it like this:
docker_exec container command arg1 ...
I'd be content to have an api to send signals to the exec'd processes, ie hup, wait a few seconds, kill, the way a shutdown would do it.
ala
POST /exec/(id or name)/kill
Signal: default hup
Any updates? We still want this.
Yo, 2017! Any love?
+1
/cc @mlaventure
I guess the daemon could be updated to kill the associated process if the exec wasn't started in detached mode. But this behavior has been there for quite some time now, and I wonder if some people incorrectly relies on this side effect.
/cc @tonistiigi @crosbymichael @cpuguy83 for their opinions.
docker run
has a flag --sig-proxy
, which is enabled by default when attached.
Not sure about changing defaults, but it would be nice to have docker exec
be able to proxy signals to the process it's attached to.
+1
@cpuguy83 there seems to be a part of the code that actually disables forwarding signals though; see #28872 (comment), and --sig-proxy
is also ignored then
We're hitting this as well, both on docker 1.12.6 and on 17.0.6.0-ce. We have some automated processes that attach to running containers via docker exec -it
from an SSH session, interacting with the running processes and gathering output to logs. If the SSH connection is disrupted, the docker exec
'd process remains running. We end up accumulating these stale processes over time.
ping @mlaventure @crosbymichael any thoughts on #9098 (comment) and #28872 (comment) ?
This issue was the root cause of a number of inconveniences I've experienced over the past several months, and only today did I finally land on this bug.
The workaround isn't too fun or easy either. I'm using Python to call docker exec
in a subprocess, and what I settled on amounts to grepping docker exec ps ...
to get the PID of the command I just ran, followed by docker exec kill ...
to kill the process running inside the container. There were also some tricky aspects to what I had to do, but I won't describe them here.
I think this issue should be prioritized more highly because it's the kind of behavior one takes for granted, and in certain use cases (like in my case) it's easy not to notice this bug was happening all along.
I had that issue with both docker run
and docker exec
not dispatching signals to the daemon/container. The root cause is when using --tty
signal proxying is entirely disabled with no way to enable it (even with --sig-proxy
). It affects at least docker run
and docker exec
that share the same code path.
Previously --sig-proxy
was an option to force proxying signals when not using a tty, and hence when using --tty
the proxy got forwarded. Passing both --sig-proxy
and -tty
was leading to an error. That looked like:
Options | Signal proxy? |
---|---|
none | No |
--tty |
Yes |
--sig-proxy |
Yes |
--tty --sig-proxy |
error: TTY mode (-t) already imply signal proxying (-sig-proxy) |
October 2013 patch e0b59ab was made to "_Enable sig-proxy by default in run and attach _". It changed --sig-proxy
to default to true and made --tty
to always disable signal proxy.
Options | Signal proxy? |
---|---|
none | Yes (was No) |
--tty |
No (was Yes) |
--sig-proxy |
Yes |
--tty --sig-proxy |
No (was an error) |
So what happened with e0b59ab is that signal proxying is now enabled by default for non-tty. BUT the patch has a fault: setting --tty
always disable signal proxying.
I am pretty sure that signal-proxying should be enabled by default whether it is a tty or non-tty mode, it is still possible to disable it with --sig-proxy=false
. A patch would thus have to implement the following changes:
Options | Signal proxy (current) | Expected |
---|---|---|
none | Yes | Yes |
--tty |
No | Yes |
--sig-proxy |
Yes | Yes |
--sig-proxy=false |
No | No |
--tty --sig-proxy |
No | Yes |
--tty --sig-proxy=false |
No | No |
TLDR: --tty
should not arbitrarily force sigProxy = false
caused by e0b59ab
Reference: https://phabricator.wikimedia.org/T176747#3749436
It there any update? I have the same issue, I run container (as a sandbox for code), and then Jenkins run inside container some script many times, and in case if Jenkins build (which ran docker exec blah bash -c '<dangerous code>'
was aborted bash process still run in a container.
@TH3MIS if you can use docker run --init
instead it shouldn't happen.
The init process in the container will ensure that signals are forwarded to the executed process in docker.
@fho
No, it also doesn't work.
- In first terminal:
docker run --init --rm -it --name ubuntu ubuntu:16.04
- In the second terminal:
docker exec ubuntu bash -c 'sleep 77'
And when I press CTRL+C, I still see sleep 77
in container processes.
No, it's not resolved. The docker CLI does not forward signals to the exec process. This is one of the areas where exec
's usage is at odds with the original intent, which is debugging.
The functionality just needs to be rounded out
@TH3MIS ok, I meant running your command via
docker run --init --rm -it --name ubuntu ubuntu:16.04 bash -c 'sleep 77'
instead.
If you still want to use docker-exec you have to run your command via the init-process manually:
docker exec ubuntu bash -c '/dev/init -s -- sleep 77'
Otherwise the signals are still not forward to your bash process
@TH3MIS no one is saying this is an issue with docker run. It's an issue with docker exec and your example still doesn't work. Signals are not forwarded when you kill the docker exec process no matter what combination of commands you pass to it.
$ docker run --init -di ubuntu:16.04
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
21d2b053d10f ubuntu:16.04 "/bin/bash" 3 minutes ago Up 3 minutes reverent_perlman
$ ps -eaf --forest
root 1846 1 0 Mar06 ? 00:30:23 /usr/bin/dockerd -H fd://
root 8823 1846 0 Mar06 ? 00:16:40 \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir
root 563 8823 0 10:38 ? 00:00:00 \_ docker-containerd-shim 21d2b053d10f9d26e294ff055494b4136624fae1faa0952ba63e6d36cf21df74 /var/run/docker/libcontainerd/21d2b053d10
root 583 563 0 10:38 ? 00:00:00 \_ /dev/init -- /bin/bash
root 623 583 0 10:38 ? 00:00:00 \_ /bin/bash
$ docker exec reverent_perlman bash -c '/dev/init -s -- sleep 7777'
<open a new terminal>
$ ps -eaf --forest
root 1846 1 0 Mar06 ? 00:30:24 /usr/bin/dockerd -H fd://
root 8823 1846 0 Mar06 ? 00:16:41 \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir
root 563 8823 0 10:38 ? 00:00:00 \_ docker-containerd-shim 21d2b053d10f9d26e294ff055494b4136624fae1faa0952ba63e6d36cf21df74 /var/run/docker/libcontainerd/21d2b053d10
root 583 563 0 10:38 ? 00:00:00 | \_ /dev/init -- /bin/bash
root 623 583 0 10:38 ? 00:00:00 | \_ /bin/bash
root 1404 8823 0 10:43 ? 00:00:00 \_ docker-containerd-shim 21d2b053d10f9d26e294ff055494b4136624fae1faa0952ba63e6d36cf21df74 /var/run/docker/libcontainerd/21d2b053d10
root 1454 1404 0 10:43 ? 00:00:00 \_ /dev/init -s -- sleep 7777
root 1460 1454 0 10:43 ? 00:00:00 \_ sleep 7777
$ ps -eaf | grep 'docker exec'
root 1715 32646 0 10:45 pts/18 00:00:00 docker exec reverent_perlman bash -c /dev/init -s -- sleep 7777
$ kill 1715
<docker exec in original terminal has exited>
$ ps -eaf --forest
root 1846 1 0 Mar06 ? 00:30:25 /usr/bin/dockerd -H fd://
root 8823 1846 0 Mar06 ? 00:16:41 \_ docker-containerd -l unix:///var/run/docker/libcontainerd/docker-containerd.sock --metrics-interval=0 --start-timeout 2m --state-dir
root 563 8823 0 10:38 ? 00:00:00 \_ docker-containerd-shim 21d2b053d10f9d26e294ff055494b4136624fae1faa0952ba63e6d36cf21df74 /var/run/docker/libcontainerd/21d2b053d10
root 583 563 0 10:38 ? 00:00:00 | \_ /dev/init -- /bin/bash
root 623 583 0 10:38 ? 00:00:00 | \_ /bin/bash
root 1944 8823 0 10:47 ? 00:00:00 \_ docker-containerd-shim 21d2b053d10f9d26e294ff055494b4136624fae1faa0952ba63e6d36cf21df74 /var/run/docker/libcontainerd/21d2b053d10
root 1961 1944 0 10:47 ? 00:00:00 \_ /dev/init -s -- sleep 7777
root 1967 1961 0 10:47 ? 00:00:00 \_ sleep 7777
any process about this issue?
when "docker exec" process was killed, the docker daemon get no signal from docker cli.
the containerd-shim process and "docker exec cmd" process will not exit.
This happens even if we did not add the --detach flag in the docker exec command.
@thaJeztah @cpuguy83
any plan about this ?
i think the ”docker exec“ and "docker run" are different issues.
when "docker run" process was killed, the container started by "docker run" should exit.
when "docker exec" process was killed, the container started before should remain running, only the process started by "docker exec" inside the container should exit.
also, this is not only about signal proxying.
what if we run "docker exec" or "docker run -H " in a host and dockerd was started in a remote host?
signal proxy is not sufficient for this situation.
Is there any movement on this? We've just hit this issue as well - very unexpected behavior.
I am working on kubernetes/kubernetes#87281 where we need the ability to kill docker exec
command.
When would the fix get to be reviewed ?
Thanks
I do this nasty piece of work...
docker container exec -d my_container sleep 999
Then later when I want to kill the long running command i.e. the sleep:
docker container top my_container | grep "sleep 999" | awk '{print $2}' | xargs kill
In my case the container in question doesn't have a "ps" command so it's cleaner to kill from the host.
This issue is still very much existing and causing unexpected and unwanted behavior.
Spent several hours running around in circles yesterday, thanks to this "feature". (Well with kubectl, but I blame docker for starting this mess).
If it's not going to be fixed, can we at least have a clear warning in the documentation / --help output / man page?
Hello. I've run into this issue too. I offer here a workaround done to the best of my limited understanding of how docker works and its internals (most learned while creating this workaround).
First, let me explain the use case:
I run a docker image which runs a basic piece of software (a driver, long-time running process, using a number of random ports).
Over time, I need to run further commands to interact with this driver, these commands are also long-time running processes. They interact with the driver and in doing so they need to get some resources and free them properly afterwards, which implies, they need to be killed cleanly, i.e. when getting SIGTERM or SIGINT they must do cleanup before exiting. All this software is automatically run by a tty-less environment (supervisord).
Summary: I do docker run -t FLAGS --name DRIVER_CONTAINER MYIMAGE MYDRIVER
and later on every X time docker exec -t DRIVER_CONTAINER MYDRIVERCLIENT
. I need the exec
commands to exit cleanly and not remain as zombies.
Now, the fun part, my workaround: I made a wrapper, using the Docker API and some hacky bits, for docker exec
to enable signal forwarding. It is contained in this gist which has the class DockerExecWithSignalForwarding. It contains a command line interface that imitates docker exec
, supporting (almost) the same flags and nature of use. I install it as dockerexec
to make it look similar to docker exec
. It supports remote DOCKER_HOST via SSH.
I hope this may help by: giving an option to people falling into the same issue, and maybe providing an idea on how to do it officially (if there is no better way). And I'd also request feedback on how to improve it (in the comments of the gist probably).
I reproduce it here as it is right now for ease of reading:
#!/usr/bin/env python3
import sys
import argparse
import os
import subprocess
import signal
import time
import threading
import shlex
import docker
"""
This program is meant to substitute running 'docker exec <FLAGS> CONTAINER_NAME COMMAND'
to overcome the limitation of docker exec not forwarding signals to the
executed process.
This was reported here in Nov 2014: https://github.com/moby/moby/issues/9098#issuecomment-312152980)
Furthermore, here: https://github.com/docker/cli/pull/1841 they say
moby/moby#9098 Kill docker exec command will not terminate the spawned process
This patch does not fix the docker exec case; it looks like there's no API to kill an exec'd process,
so there's no signal-proxy for this yet
Note: -i is not supported, couldn't make it work but got as close as I could (maybe it can't work?).
Author: Sammy Pfeiffer <sam.pfeiffer at hullbot.com>
"""
class DockerExecWithSignalForwarding(object):
def __init__(self, container_name, command,
# Offer the rest of the options from docker exec
detach=False,
# Note detach-keys is not implemented
environment=None,
# Not supported
interactive=False,
privileged=False,
tty=False,
# Leaving these as None makes the inner docker API deal with them correctly
user=None,
workdir=None,
# By default the timeout of Python's docker exec is 60s, change it to 1 year-ish
socket_timeout=60 * 60 * 24 * 365):
"""
Provided a set of flags (same ones of docker exec), a container name and a command,
do 'docker exec' but managing signals to be forwarded to the exec-ed process.
"""
if interactive:
raise RuntimeError("Interactive mode not supported, use docker exec.")
# We inherit the rest of the configuration (including DOCKER_HOST) from the environment
self.client = docker.from_env(timeout=socket_timeout)
# Sanity check on the command, should be a string which we split with shlex or already a list/tuple
if isinstance(command, str):
command = shlex.split(command)
if not (isinstance(command, list) or isinstance(command, tuple)):
raise TypeError("Command is of type {} and it must be str/list/tuple. (command: {})".format(
type(command),
command))
# Translate docker exec style arguments into exec_run arguments
try:
# Get a reference to the container
self.container = self.client.containers.get(container_name)
# Get the Id of the 'docker exec' instance (that is not yet being executed) so we can start it
exec_create_response = self.client.api.exec_create(self.container.id,
command,
stdout=True,
stderr=True,
stdin=self.interactive,
tty=tty,
privileged=privileged,
user=user,
environment=environment,
workdir=workdir)
self.exec_id = exec_create_response['Id']
# The following block of code is to manage the situation of an interactive session
# We would like to support it but the underlying API doesn't allow for it (writing into the socket
# simply does not work as far as I could test) it was a lot of work to figure out the bits
# to get this to this state, so I'm leaving it here
# if interactive:
# # Because we want to support stdin we need to access the lower-level socket
# # instead of being able to use exec_start with stream=True
# self.exec_socket = self.client.api.exec_start(self.exec_id,
# detach=detach,
# tty=tty,
# stream=False,
# socket=True,
# demux=True)
# # Recreate the function that offers the generator for output usually when using stream=True
# def _read_from_socket(socket, stream, tty=True, demux=False):
# """
# Adapted from docker/client.py in order to enable stdin... tricky.
# """
# gen = docker.api.client.frames_iter(socket, tty)
# if demux:
# # The generator will output tuples (stdout, stderr)
# gen = (docker.api.client.demux_adaptor(*frame) for frame in gen)
# else:
# # The generator will output strings
# gen = (data for (_, data) in gen)
# if stream:
# return gen
# else:
# # Wait for all the frames, concatenate them, and return the result
# return docker.api.client.consume_socket_output(gen, demux=demux)
# self.exec_output = _read_from_socket(self.exec_socket, True, tty, True)
# else:
self.exec_output = self.client.api.exec_start(self.exec_id,
detach=detach,
tty=tty,
stream=True,
socket=False,
demux=True)
self.setup_signal_forwarding()
self.program_running = True
# Imitate the behaviour of the original docker exec up to a point
except docker.errors.NotFound as e:
print("Error: No such container: {}".format(container_name))
os._exit(1)
# Start a thread that monitors if the program died so we can end this when this happens
self.monitor_thread = threading.Thread(target=self.monitor_exec)
self.monitor_thread.start()
self.output_manager_thread = None
if self.interactive:
# Deal with stdout and stderr in a thread and let the main thread deal with input
self.output_manager_thread = threading.Thread(target=self.manage_stdout_and_stderr)
self.output_manager_thread.start()
self.manage_stdin()
else:
self.manage_stdout_and_stderr()
def monitor_exec(self):
"""
We loop (very slowly) to check if the underlaying command died, this is useful for
commands executed in a remote docker daemon. It 'should' not happen locally, but it may.
"""
try:
# Check if the process is dead, the 'Running' key must become false
exec_inspect_dict = self.client.api.exec_inspect(self.exec_id)
while exec_inspect_dict.get('Running'):
# Generous sleep, as this is to catch the program dieing by something else than this wrapper
time.sleep(10.0)
exec_inspect_dict = self.client.api.exec_inspect(self.exec_id)
# If it's dead, we should exit with its exit code
os._exit(exec_inspect_dict.get('ExitCode'))
except docker.errors.APIError as e:
# API error, we can't access anymore, exit
raise RuntimeError("Docker API error when monitoring exec process ({})".format(e))
def forward_signal(self, signal_number, frame):
"""
Forward the signal signal_number to the container,
we first need to find what's the in-container PID of the process we docker exec-ed
then we docker exec a kill signal with it.
"""
# print("Forwarding signal {}".format(signal_number))
# Using a lock to attempt to deal with Control+C spam
with self.signal_lock:
pid_in_container = self.get_container_pid()
kill_command = ["kill", "-{}".format(signal_number), str(pid_in_container)]
try:
exit_code, output = self.container.exec_run(kill_command,
# Do it always as root
user='root')
except docker.errors.NotFound as e:
raise RuntimeError("Container doesn't exist, can't forward signal {} (Exception: {})".format(
signal_number, e))
if exit_code != 0:
raise RuntimeError(
'When forwarding signal {}, kill command to PID in container {} failed with exit code {}, output was: {}'.format(
signal_number, pid_in_container, exit_code, output))
def get_container_pid(self):
"""
Return the in-container PID of the exec-ed process.
"""
try:
# I wish the stored PID of exec was the container PID (which is what I expected)
# but it's actually the host PID so in the following lines we deal with it
pid_in_host = self.client.api.exec_inspect(self.exec_id).get('Pid')
except docker.errors.NotFound as e:
raise RuntimeError("Container doesn't exist, can't get exec PID (Exception: {})".format(e))
# We need to translate the host PID into the container PID, there is no general mapping for it in Docker
# If we are running in the same host, this is easier, we can get the Docker PID by just doing:
# cat /proc/PID/status | grep NSpid | awk '{print $3}'
# If the docker container is running in a different machine we need to execute that command in that machine
# which implies using SSH to execute the command
# Here we can only support DOCKER_HOST=ssh://user@host to use ssh to execute this command
# as if we are using ssh:// to access the docker daemon it's fair to assume we have SSH keys setup
# if docker host is tcp:// on another host or a socket file with SSH tunneling there isn't much we can do
docker_host = os.environ.get('DOCKER_HOST', None)
# If using SSH execute the command remotely
if docker_host and 'ssh://' in docker_host:
ssh_user_at_host = docker_host.replace('ssh://', '')
get_pid_in_container_cmd = "ssh -q -o StrictHostKeyChecking=no {} ".format(ssh_user_at_host)
get_pid_in_container_cmd += "cat /proc/{}/status | grep NSpid | awk '{{print $3}}'".format(pid_in_host)
# Otherwise, execute the command locally
else:
get_pid_in_container_cmd = "cat /proc/{}/status | grep NSpid | awk '{{print $3}}'".format(pid_in_host)
# Execute the command that gets the in-Docker PID
try:
pid_in_container = subprocess.check_output(get_pid_in_container_cmd, shell=True)
except subprocess.CalledProcessError as e:
raise RuntimeError(
"CalledProcessError exception while trying to get the in-docker PID of the process ({})".format(e))
return int(pid_in_container)
def setup_signal_forwarding(self):
"""
Forward all signals to the docker exec-ed process.
If it dies, this process will die too as self.manage_stdout_and_stderr will finish
and forward the exit code.
"""
self.signal_lock = threading.Lock()
# Forward all signals, even though we are most interested just in SIGTERM and SIGINT
signal.signal(signal.SIGHUP, self.forward_signal)
signal.signal(signal.SIGINT, self.forward_signal)
signal.signal(signal.SIGQUIT, self.forward_signal)
signal.signal(signal.SIGILL, self.forward_signal)
signal.signal(signal.SIGTRAP, self.forward_signal)
signal.signal(signal.SIGABRT, self.forward_signal)
signal.signal(signal.SIGBUS, self.forward_signal)
signal.signal(signal.SIGFPE, self.forward_signal)
# Can't be captured, but for clarity leaving it here
# signal.signal(signal.SIGKILL, self.forward_signal)
signal.signal(signal.SIGUSR1, self.forward_signal)
signal.signal(signal.SIGUSR2, self.forward_signal)
signal.signal(signal.SIGSEGV, self.forward_signal)
signal.signal(signal.SIGPIPE, self.forward_signal)
signal.signal(signal.SIGALRM, self.forward_signal)
signal.signal(signal.SIGTERM, self.forward_signal)
def manage_stdout_and_stderr(self):
"""
Print stdout and stderr as the generator provides it.
When the generator finishes we exit the program forwarding the exit code.
"""
# Note that if the application prints a lot, this will use some CPU
# but there is no way around it as we are forced to read from the socket and decode to print
for stdout, stderr in self.exec_output:
# Note that if choosing tty=True output is always in stdout
if stdout:
print(stdout.decode("utf-8"), file=sys.stdout, end='')
if stderr:
print(stderr.decode("utf-8"), file=sys.stderr, end='')
# When we come out of this loop, the program we exec-ed has terminated
# so we can exit with its exit code just here
exec_inspect_dict = self.client.api.exec_inspect(self.exec_id)
exit_code = exec_inspect_dict.get('ExitCode')
os._exit(exit_code)
def manage_stdin(self):
"""
Forward the input of this program to the docker exec-ed program.
"""
raise NotImplemented("Managing stdin is not implemented.")
# print(dir(self.exec_socket))
# print(self.exec_socket.readable())
# print(self.exec_socket.writable())
# print(dir(self.exec_socket._sock))
# self.exec_socket._writing = True
# print(self.exec_socket.writable())
# def write(sock, str):
# while len(str) > 0:
# written = sock.write(str)
# str = str[written:]
# while True:
# # self.exec_socket._sock.sendall(input().encode('utf-8'))
# # self.exec_socket.flush()
# #print("sent")
# # Doesn't work either
# write(self.exec_socket, input().encode('utf-8'))
# print("--written--")
# #os.write(self.exec_socket._sock.fileno(), input().encode('utf-8'))
# #print("sent")
# #print("Received: {}".format(self.exec_socket._sock.recv(1)))
# # try:
# # print(os.read(self.exec_socket._sock.fileno(), 4096))
# # except BlockingIOError as b:
# # print("BlockingIOError: {} ".format(b))
# # print(self.client.api.exec_inspect(self.exec_id))
def __del__(self):
"""
When the program ends this gets called so we can cleanup resources
and exit with the exit code from the exec-ed command.
Note it is unlikely this gets ever called.
"""
# print("Calling __del__")
# Wait for the output thread in case there are more prints to show
if self.output_manager_thread:
self.output_manager_thread.join()
# Try to wait for the process to be dead in case it isn't yet
try:
exec_inspect_dict = self.client.api.exec_inspect(self.exec_id)
while exec_inspect_dict.get('Running'):
time.sleep(0.1)
exec_inspect_dict = self.client.api.exec_inspect(self.exec_id)
except docker.errors.APIError as e:
# We may get an API error here, if so, return an exit code other than 0
os._exit(127)
pass
# Forward the exit code of the exec-ed command if we got here
exit_code = exec_inspect_dict.get('ExitCode')
os._exit(exit_code)
if __name__ == '__main__':
# Original docker exec --help
"""
Usage: docker exec [OPTIONS] CONTAINER COMMAND [ARG...]
Run a command in a running container
Options:
-d, --detach Detached mode: run command in the background
--detach-keys string Override the key sequence for detaching a container
-e, --env list Set environment variables
-i, --interactive Keep STDIN open even if not attached
--privileged Give extended privileges to the command
-t, --tty Allocate a pseudo-TTY
-u, --user string Username or UID (format: <name|uid>[:<group|gid>])
-w, --workdir string Working directory inside the container
"""
parser = argparse.ArgumentParser(description="Run a command in a running container")
parser.add_argument("container", help="Container name")
parser.add_argument("command_and_args", help="Command and arguments", nargs=argparse.REMAINDER)
parser.add_argument("-d", "--detach", action='store_true',
help="Detached mode: run command in the background")
# We only support environment variables as a long string if there must be more than one
# I.e. -e USER=user for one or -e "USER=user SOMETHING_ELSE=1"
# Supporting multiple -e didn't work for me
parser.add_argument("-e", "--env",
type=str, help="Set environment variables (like 'VAR1=1 VAR2=2')")
# Interactive is not supported, but leaving it here just in case it is implemented in the future
parser.add_argument("-i", "--interactive", action='store_true',
help="Keep STDIN open even if not attached (Note: not implemented, use 'docker exec')")
parser.add_argument("--privileged", action='store_true',
help="Give extended privileges to the command")
parser.add_argument("-t", "--tty", action='store_true',
help="Allocate a pseudo-TTY")
parser.add_argument("-u", "--user",
type=str, help="Username or UID (format: <name|uid>[:<group|gid>])")
parser.add_argument("-w", "--workdir",
type=str, help="Working directory inside the container")
args = parser.parse_args()
if len(args.command_and_args) < 1:
print("dockerexec requires at least 2 arguments")
parser.print_help()
exit(1)
if args.interactive:
raise NotImplemented("Interactive mode not implemented, you should just use docker exec")
dewsf = DockerExecWithSignalForwarding(args.container,
args.command_and_args,
detach=args.detach,
# Note detach-keys is not implemented
environment=args.env,
interactive=args.interactive,
privileged=args.privileged,
tty=args.tty,
user=args.user,
workdir=args.workdir)
# The following lines are tests done with a container running:
# docker run --rm -t --name exec_signal_problem python:3 sleep 999
# Proper testing should be implemented based on this
# # Forward error test
# de = DockerExec('exec_signal_problem',
# 'ls asdf',
# tty=True,
# interactive=False)
# # simple working test
# de = DockerExec('exec_signal_problem',
# 'ls',
# tty=True,
# interactive=False)
# Test signal forwarding SIGINT Control C
# de = DockerExec('exec_signal_problem',
# 'python -c "import sys;import signal;signal.signal(signal.SIGINT, print);print(\'hello\', file=sys.stderr);import time; time.sleep(600)"',
# tty=True,
# interactive=False)
# Test signal forwarding SIGTERM
# de = DockerExec('exec_signal_problem',
# 'python -c "import sys;import signal;signal.signal(signal.SIGTERM, print);print(\'hello\', file=sys.stderr);import time; time.sleep(600)"',
# tty=True,
# interactive=False)
# Test output in stderr
# de = DockerExec('exec_signal_problem',
# 'python -c "import sys; print(\'hello stderr\', file=sys.stderr);print(\'hello stdout\', file=sys.stdout)"',
# tty=False,
# interactive=False)
# test input, doesn't work, not supported (not needed anyways)
# de = DockerExec('exec_signal_problem',
# 'cat',
# tty=True,
# interactive=True)
Sent here from #2607 -- it's still very much an issue with the latest versions, and very confusing indeed.
Just a little update, I revived the PR that implements docker exec killing: #41548
The last reply included:
v20.10 (API v1.41) is feature-freezed, so probably this PR will be merged after the release of v20.10, before v21.XX (v21.03?).
Just a little update, I revived the PR that implements docker exec killing: #41548
The last reply included:v20.10 (API v1.41) is feature-freezed, so probably this PR will be merged after the release of v20.10, before v21.XX (v21.03?).
hi, sam, @awesomebytes i am wondering whether #41548 solves senario when we kill process docker exec -it xx
directly, and /bin/sh may left without be killed. seems a new api need call by users?
Greetings from the year 2022, where I lost my Friday night to this issue. I was also in the situation where I was trying to run docker exec
commands from within a python subprocess.Popen
call. My most elegant workaround was to find the PID of the persistent process inside the docker container by using subprocess.run
to execute:
docker exec my_container bash -c 'pgrep -xf "my_exact_persistent_command"'
with that PID, I'm able to easily kill it with another subprocess.run
of:
docker exec my_container bash -c 'kill my_pid'
I ran into the same problem (with hundreds of stale shells not getting SIGHUP when the docker exec
client received the SIGHUP). Also, looks like half of the Internet thinks that --init
would solve it - obviously not and sends people into the wrong direction.
I wrote it down here:
https://gist.github.com/SkyperTHC/cb4ebb633890ac36ad86e80c6c7a9bb2
The workaround at the moment is a clean-up cron job - it's a mess.
If you want another workaround, a few comments up I made this one:
#9098 (comment)
Also, the PR to add the correct kill behaviour is still pending here: #41548
We use -it
and thus your solution did not work for us. (but thanks for the great work).
I wrote my own solution:
https://github.com/hackerschoice/segfault/blob/main/host/docker-exec-sigproxy.c
The tool intercepts traffic on the /var/run/docker.sock and detects when a 'docker exec' happens. It registers all signals and then forwards (proxies) the signal to the process running inside the instance.
Old command:
docker exec -it alpine
new command:
docker-exec-sigproxy exec -it alpine
I wonder why docker wont add the --sig-proxy=true
to the 'docker exec'......Half the Internet is crying about stale processes and are being suggested to use --init
and send down the wrong path...
I would have thought that the 'proper' solution here is to NOT cascade signals (since that can never be 100% reliable) and instead the container-side pty should detect that the connection is lost, and carry out the normal SIGHUP behaviours that were designed and reliable in the 1970's based on RS-232 terminals.
Is this approach not possible?
I would have thought that the 'proper' solution here is to NOT cascade signals (since that can never be 100% reliable) and instead the container-side pty should detect that the connection is lost, and carry out the normal SIGHUP behaviours that were designed and reliable in the 1970's based on RS-232 terminals.
Is this approach not possible?
What you are describing is how every user would expect docker to behave - including me.
The tool I provided adds exactly that behaviour: The docker container (you call it "app" above) will receive a SIGHUP when the 'docker exec' disconnects (e.g. terminates). (and yes, cascading signals is reliable in this instance. The kernel wont drop signals or forget about them - they will get delivered).
Docker does not do this. In docker-exec-land the "app" is executed within its own PTY harness and will not receive a SIGHUP if the docker-exec client 'disconnects' (hangs up).
I've explained the details of this 'misbehaviour' above in my earliest post. The details are far more complicated and have to do how docker-exec instructs the Linux Kernel to start the 'app' from PID=1 etc etc and thus behave very different than the 1970's RS-232 terminal would have. Anyway, my tool above makes docker-exec behave as it was in the 70s.
We have been using @SkyperTHC 's docker-exec-sigproxy fine for some time, when we hit a problem: for a long-running process, the socket dropped after exactly 5 minutes. After several attempts at changing this delay, we ended up dropping the signal proxy and launching a second exec to kill the process by pid.
- The initial exec command prints its pid to a unique file (we already had a unique output folder for every run). Example when launching Julia script:
command = listOf("/usr/local/bin/docker", "exec", "-i", container, "julia", "-e",
"""
open("${pidFile.absolutePath}", "w") do file write(file, string(getpid())) end;
ARGS=["${outputFolder.absolutePath}"];
include("${scriptFile.absolutePath}")
"""
)
- If the process is cancelled, we read the file and issue a SIGTERM
/usr/local/bin/docker exec -i <container> kill -s TERM <pid>
- The .pid file is removed upon completion or termination
The new docker exec doc is there https://docs.docker.com/engine/api/v1.43/#tag/Exec/operation/ExecStart
There is still no way to kill a started exec
Just ran into this problem now