docker kill leaves directories behind.

Question

docker kill leaves directories behind.

simonjohansson opened this issue 12 years ago · comments

Doing a docker kill UUID have left some directories behind in /var/lib/docker/containers
Doing a ls reveals

$ls /var/lib/docker/containers/0a50ba2e6217fe8234fe6a29f84e97b541631697777515f92259f276d7f83d3e/
ls: cannot access /var/lib/docker/containers/0a50ba2e6217fe8234fe6a29f84e97b541631697777515f92259f276d7f83d3e/rootfs: Stale NFS file handle
rootfs

I am running docker inside a rather slow virtualbox-vm (Ubuntu 12.04, 3.5.0-23-generic). I have right now 7 of these directories, two of them comes from containers where I have made big changes(apt-get update), the other five have only been "echo hello world"-containers.

Relevant IRC-chat

23:11 < DinMamma> Ah, this is interesting, when looking into the cointaners in /var/lib/docker/containers I get "ls: cannot access 
                  rootfs: Stale NFS file handle"
23:11 < DinMamma> So I wonder if this is a issue with my system rather than docker.
23:11 <@shykes> DinMamma: no, this is a known issue with aufs, which we thought we had neutralized
23:12 <@shykes> basically aufs umount is asynchronous
23:12 <@shykes> it does background cleanup
23:12 <@shykes> if you remove the mountpoint too quickly before aufs is done with cleanup, it gets stuck
23:12 <@shykes> and you get that error message
23:13 < DinMamma> I should say that I am running my tests inside a rather slow virtualbox-vm.
23:13 <@shykes> I'm surprised that you hit this. We have a workaround which includes checking the stat() on the mountpoint in a loop, 
                until its inode changes
23:19 <@shykes> DinMamma: so am I :)
23:19 <@shykes> mmm that could be it
23:20 <@shykes> DinMamma: did one of these containers have a lot of filesystem changes on them?
23:20 <@shykes> like a big apt-get, or something like that?
23:20 < DinMamma> Yep
23:20 < DinMamma> Two of them.
23:20 <@shykes> maybe slow machine + lots of data on the aufs rw layer means -> our workaround timed out, and gave up waiting for aufs

Solomon Hykes · Answer 1 · Wed Mar 27 2013 07:37:35 GMT+0800 (China Standard Time)

Just an extra comment: it is normal for 'docker kill' to leave the container directory. By default all containers are stored, so you can inspect their filesystem state, commit them into images, restart them etc.

But of course it is not normal to see "stale NFS handle" errors :)

Victor Vieux · Answer 2 · Fri Apr 12 2013 01:29:31 GMT+0800 (China Standard Time)

I can't reproduce.

My host is ubuntu12.10 and I used the base as guest.
Anybody can reproduce ?

Thomas Hansen · Answer 3 · Mon Apr 15 2013 21:47:11 GMT+0800 (China Standard Time)

Is there a way to manually repair the directory so I can delete the directories without rebooting the host?

Solomon Hykes · Answer 4 · Mon Apr 15 2013 23:19:28 GMT+0800 (China Standard Time)

Not that I know of. Note that there is no known side-effect outside the
scope of that container.

On Monday, April 15, 2013, Thomas Hansen wrote:

Is there a way to manually repair the directory so I can delete the
directories without rebooting the host?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/197#issuecomment-16385541
.

Solomon Hykes · Answer 5 · Wed Apr 24 2013 00:20:18 GMT+0800 (China Standard Time)

As discussed earlier, this is probably due to the asynchronous nature of aufs unmount.

I'm downgrading this to minor bug, since:

a) it occurs very rarely (1 known occurrence so far)
b) it has no impact on the behavior of docker or the system,
c) it's very hard to reproduce

Ricardo Amaro · Answer 6 · Mon Jul 01 2013 03:39:30 GMT+0800 (China Standard Time)

+1 on a fix for this since i just bumped into it:

~# docker rm 5cbb64c3279a Error: Error destroying container 5cbb64c3279a: stat /var/lib/docker/containers/5cbb64c3279a76acaac4769e4a6c57c39a7fff6027b51d14ecff08040d252d13/rootfs: stale NFS file handle

Guillaume J. Charmes · Answer 7 · Thu Jul 25 2013 01:54:02 GMT+0800 (China Standard Time)

@simonjohansson Since #816, did you get the error?

Victor Vieux · Answer 8 · Tue Jul 30 2013 20:46:42 GMT+0800 (China Standard Time)

ping @simonjohansson

Simon Johansson · Answer 9 · Tue Jul 30 2013 21:01:10 GMT+0800 (China Standard Time)

Hi guys, sorry I didn't see this until now. I have some holiday coming up in the next couple of days, Ill make sure to see if #816 fixed the issue!

Nuno Job · Answer 10 · Fri Aug 02 2013 07:38:24 GMT+0800 (China Standard Time)

Just encountered the same issue:

root@dscape:~# docker ps -a | grep 'Exit' |  awk '{print $1}' | xargs docker rm
Error: Error destroying container 38b561af34e1: stat /var/lib/docker/containers/38b561af34e1bb0b3e92d7b1fe734aeabf223d6a5c36757be8925514e28e8b45/rootfs: stale NFS file handle

Error: Error destroying container 112a0c0b9c95: stat /var/lib/docker/containers/112a0c0b9c9546697f20dd7ed21899b789f981eb5195d189b1503ab1893184e4/rootfs: stale NFS file handle

Error: Error destroying container ef13c73b64a9: stat /var/lib/docker/containers/ef13c73b64a991e2b937fbcb1fae412d7b6404dcb67ae105c06ebd5b62926f35/rootfs: stale NFS file handle

Error: Error destroying container e0178615f6d8: stat /var/lib/docker/containers/e0178615f6d8be7ca343c89c398536713542413fa7ac04d172bb268f626a252a/rootfs: stale NFS file handle

Error: Error destroying container 3c8659a041c9: stat /var/lib/docker/containers/3c8659a041c9217e35c056e96da0fe5dc9d5eae43f37874ff372190ed8867277/rootfs: stale NFS file handle

Error: Error destroying container 99dee8e5a486: stat /var/lib/docker/containers/99dee8e5a486b8eeff3855e6750e1dee90ec4c8af022ed9a43304edda411b507/rootfs: stale NFS file handle

Error: Error destroying container b7ac0d3f3f79: stat /var/lib/docker/containers/b7ac0d3f3f79ae35883d09e796332726322e56bdd715e5484210bf84099cc513/rootfs: stale NFS file handle

Error: Error destroying container 7329c9be9795: stat /var/lib/docker/containers/7329c9be97957b187cdb6cbb825ab506e3a8610c01b4055ad5cc64fc58a6e985/rootfs: stale NFS file handle

root@dscape:~# docker version
Client version: 0.4.8
Server version: 0.4.8
Git commit: ??
Go version: go1.1.1

Simon Johansson · Answer 11 · Thu Aug 08 2013 05:38:31 GMT+0800 (China Standard Time)

I cannot reproduce anymore.

Client version: 0.5.0
Server version: 0.5.0
Git commit: 51f6c4a
Go version: go1.1.1

GG :)

Guillaume J. Charmes · Answer 12 · Thu Aug 08 2013 05:57:39 GMT+0800 (China Standard Time)

@dscape can you try again with docker 0.5.1?

Daniel Tabuenca · Answer 13 · Sat Aug 10 2013 09:36:52 GMT+0800 (China Standard Time)

I keep seeing this issue over and over using docker inside VirtualBox. I usually run docker rm $(docker ps -a |cut -d " " -f 1) to remove all containers but many of them fail with stale NFS file handle.

Paulo Suzart · Answer 14 · Sun Aug 11 2013 03:51:02 GMT+0800 (China Standard Time)

Just to add, I tried some brutal force removing the directories of such containers. After that, trying to remove them via docker rm still prints the same message.

Managed to remove after restarting docker host.

Ricardo Amaro · Answer 15 · Thu Aug 15 2013 07:19:18 GMT+0800 (China Standard Time)

This seems fixed to me.
Using:

# docker version
Client version: 0.5.3
Server version: 0.5.3
Git commit: 5d25f32
Go version: go1.1.1

Also make sure you have no bash running inside the container path.

dsissitka · Answer 16 · Fri Aug 16 2013 21:21:06 GMT+0800 (China Standard Time)

Was the asynchronous unmount theory ever proven? I wonder if this is the "deleted a container's image while the container is running" bug:

# Pane 1
$ docker run -i -t foo /bin/bash
root@d6d23b36b613:/#

# Pane 2
$ docker rmi foo
Untagged: 1cfaa4fe8724
Deleted: 1cfaa4fe8724
$

# Pane 1
root@d6d23b36b613:/# exit
$ docker rm `docker ps -l -q`
Error: Error destroying container d6d23b36b613: stat /var/lib/docker/containers/d6d23b36b613337b8e8bbc2ee90af11da3c5fab78a07a01a43ba7262359292ca/rootfs: stale NFS file handle

$

Puneet Goyal · Answer 17 · Thu Oct 10 2013 16:59:42 GMT+0800 (China Standard Time)

@dsissitka i think that is exactly what it is. happened with me.

 $ docker version
Go version (client): go1.1.1
Go version (server): go1.1.1
Last stable version: 0.6.3

how can the container be removed now?

Michael Crosby · Answer 18 · Thu Nov 28 2013 07:15:02 GMT+0800 (China Standard Time)

The original issue is resolved in 0.7 because kill does not do an umount anymore. Containers are unmounted when the daemon is stopped.

Elias Probst · Answer 19 · Sun Dec 01 2013 00:35:38 GMT+0800 (China Standard Time)

In case anyone has a /var/lib/docker/volumes directory full of orphaned volumes, feel free to use the following Python script (make sure to understand what it does before executing it):

#!/usr/bin/python

import json
import os
import shutil
import subprocess
import re

dockerdir = '/var/lib/docker'
volumesdir = os.path.join(dockerdir, 'volumes')

containers = dict((line, 1) for line in subprocess.check_output('docker ps -a -q -notrunc', shell=True).splitlines())

volumes = os.walk(os.path.join(volumesdir, '.')).next()[1]
for volume in volumes:
    if not re.match('[0-9a-f]{64}', volume):
        print volume + ' is not a valid volume identifier, skipping...'
        continue
    volume_metadata = json.load(open(os.path.join(volumesdir, volume, 'json')))
    container_id = volume_metadata['container']
    if container_id in containers:
        print 'Container ' + container_id[:12] + ' does still exist, not clearing up volume ' + volume
        continue
    print 'Deleting volume ' + volume + ' (container: ' + container_id[:12] + ')'
    volumepath = os.path.join(volumesdir, volume)
    print 'Volumepath: ' + volumepath
    shutil.rmtree(volumepath)

Roman Heinrich · Answer 20 · Wed Dec 04 2013 21:53:39 GMT+0800 (China Standard Time)

thanks for the script! I fixed the indentation and a small bug:

container_id = volume_metadata['id'] # (not container anymore)

https://gist.github.com/mindreframer/7787702

Elias Probst · Answer 21 · Thu Dec 05 2013 06:10:30 GMT+0800 (China Standard Time)

Thanks! No idea why the indentation was messed up in my post, edited + fixed it.

I used volume_metadata['container'] because I was still on 0.6.6 when I wrote the script, but anyone using 0.7.0 (or later) should use your changes.