rootless-containers / usernetes

Kubernetes without the root privileges

Home Page:https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2033-kubelet-in-userns-aka-rootless

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Bind mounts in rootlesskit namespace behave oddly

maybe-sybr opened this issue · comments

I've been trying to do some hostpath volume mounting and noticed that it only works if I mount (rootfully) from outside the rootlesskit namespace. This is a bit surprising and from what I can tell with some digging through the namespaces, it should work fine. Along the way, I noticed that performing bind mounts in the rootlesskit namespace doesn't seem to work reliably. If I do:

(host) $ find
.
./bar
./bar/corge
./foo
./foo/quux
(host) $ /path/to/usernetes/boot/nsenter.sh
(rkns) $ umount ./bar; mount --bind ./foo ./bar
(rkns) $ find 
.
./bar
./bar/corge
./foo
./foo/quux

The weird thing is that sometimes it works. Every so often I get into the RK namespace, do the mount and see it actually work as it should.

(rkns) $ find 
.
./bar
./bar/corge
./foo
./foo/quux
(rkns) $ while [ ! -f ./bar/quux ]; do
  umount ./bar
  mount --bind ./foo ./bar
  find; sleep 1
done
umount: ./bar: not mounted.
.
./bar
./bar/quux
./foo
./foo/quux

And if I do this in my own namespace made using unshare -Urm, it works every time.

Is there some weird magic that rootlesskit is doing which might cause this? The bind mount is actually happening and can be observed in findmnt. If it's relevant, this is on an ext4 FS.

This all came up when I was attempting to do bind mounts in the rootlesskit ns and expecting them to map up into a container which has a hostpath mounted into it with HostToContainer propagation. That doesn't work as expected either. I've seen it work after restarting the pod (rollout restart deployment ...) but then I started digging and hit this behaviour which seems like it could be a simpler case to look into and might be the same issue.

Could be related:

--propagation=rslave \

Maybe, that argument seems to make sense though since we don't want all mounts done by rootlesskit to propagate upward/sideways to its peer group.

I've just tried to replicate the behaviour on a non-lvm volume (a random ISO on a USB I had handy) and bind mounting in the rootlesskit namespace seems to work fine with that. This might be an issue with device mapper or something weird like that.

Edit: I also just tried to make the USB a bit more representative of my /home mount (luks, lvm, ext4, extended ACLs, bind mounting two dirs in the same device) and it seems to work fine again. I'm beginning to think that this is something weird related to my home directory rather than something which could be blamed on rootlesskit's use of namespaces.

Edit 2: Aha! My USB now has the same behaviour as my home directory. Not sure what changed but it does seem to be misbehaving in the same way - mostly not showing the bind mount but sometimes it works. I'll continue trying to narrow down what might be causing the issue.

Running a separate rootlesskit appears to work fine in both my home mount and the USB:

$ .../usernetes/bin/rootlesskit --propagation rslave bash -c 'mount --bind foo bar; find; umount bar'
.
./foo
./foo/quux
./bar
./bar/quux

Turns out this might be some underlying bug with bind mounts. I've got a minimal repro [0] which doesn't involve rootlesskit at all and I've observed the misbehaviour on multiple machines other than my own so I'm going to close this and see if I can push the bug at some kernel people.

[0] https://gist.github.com/maybe-sybr/0636c7e10c8eb193d0e2880d7b5c0d6b