Orphaned pod found, but volume paths are still present on disk

Question

Orphaned pod found, but volume paths are still present on disk

mattshma opened this issue 6 years ago · comments

机器因故自动重启后，Kubelet 等都启动正常，不过无法获取该机器上的 gpu 信息。查看 kubelet log，有如下报错：

E0606 14:57:30.245413    3284 kubelet.go:1275] Image garbage collection failed once. Stats initialization may not have completed yet: failed to get imageFs info: unable to find data for container /
E0606 14:57:30.257693    3284 kubelet.go:1333] Failed to start gpuManager stat /dev/nvidiactl: no such file or directory
E0606 14:57:30.257973    3284 container_manager_linux.go:583] [ContainerManager]: Fail to get rootfs information unable to find data for container /
E0606 14:57:31.258134    3284 container_manager_linux.go:583] [ContainerManager]: Fail to get rootfs information unable to find data for container /
E0606 14:57:32.258269    3284 container_manager_linux.go:583] [ContainerManager]: Fail to get rootfs information unable to find data for container /
E0606 14:57:33.258393    3284 container_manager_linux.go:583] [ContainerManager]: Fail to get rootfs information unable to find data for container /
E0606 14:57:34.258637    3284 container_manager_linux.go:583] [ContainerManager]: Fail to get rootfs information unable to find data for container /
E0606 14:57:35.459204    3284 reconciler.go:376] Could not construct volume information: Volume: "kubernetes.io/rbd/[]:" is not mounted
E0606 14:57:35.459280    3284 reconciler.go:376] Could not construct volume information: Volume: "kubernetes.io/rbd/[]:" is not mounted
E0606 14:57:36.283582    3284 kubelet_volumes.go:128] Orphaned pod "2ec25470-6929-11e8-a2cf-005056b75104" found, but volume paths are still present on disk : There were a total of 1 errors similar to this. Turn up verbosity to see them.
E0606 14:57:38.262971    3284 kubelet_volumes.go:128] Orphaned pod "2ec25470-6929-11e8-a2cf-005056b75104" found, but volume paths are still present on disk : There were a total of 2 errors similar to this. Turn up verbosity to see them..

由于该机器上的 pod 等都没在使用，所以相关的 volume 信息应该删除:

$ sudo systemctl stop kubelet kube-proxy
$ rm -rf /var/lib/k8s/kubelet/pods/2ec25470-6929-11e8-a2cf-005056b75104
$ sudo systemctl start kubelet kube-proxy

恢复正常。

回溯了下整个经过，2ec25470-6929-11e8-a2cf-005056b75104 这个 pod 因故未能在这台机器启动，该机器的 K8S 未能成功回收 pod，导致 K8S 崩溃，进而影响该机器上的其他实例。pod 未成功启动的原因没查到。

mattshma · Answer 1 · Wed Jun 06 2018 17:53:57 GMT+0800 (China Standard Time)

D: kubernetes/kubernetes#60987

Nightmare漆黑之梦 · Answer 2 · Mon Jul 02 2018 14:44:31 GMT+0800 (China Standard Time)

@mattshma 最好不要这么干，因为如果有pvc的话，rm -rf pod会顺便吧pvc里面的内容干掉，回收pod不是这么干的。

mattshma · Answer 3 · Wed Jul 11 2018 10:55:36 GMT+0800 (China Standard Time)

@NightmareZero 删除 POD 确实不是这么删除的。不过我这里删除的是 Orphaned pod，如果不这么删，请问下是否还有其他更好的删除方法么？如果有的话麻烦告知下，谢谢~

Nightmare漆黑之梦 · Answer 4 · Wed Jul 11 2018 10:59:02 GMT+0800 (China Standard Time)

只是提醒一下，先umount，我是被坑惨了，数据库里面的数据都清空了

mattshma · Answer 5 · Mon Jul 16 2018 17:00:58 GMT+0800 (China Standard Time)

@NightmareZero 额，我存储使用的 rbd，直接删除的话，数据在 rbd 还是有的，没出现你说的这种情况。

Nightmare漆黑之梦 · Answer 6 · Mon Jul 16 2018 17:09:56 GMT+0800 (China Standard Time)

我用的也是rbd，用rm -rf的话，会导致rbd mount出来的设备删不掉，但是里面的内容被删干净的情况
你可以做做试验在/mnt/test/t1挂个设备，然后rm -rf /mnt/test试试，t1里面的数据会被清空的

mattshma · Answer 7 · Sat Jul 21 2018 16:32:09 GMT+0800 (China Standard Time)

@NightmareZero mount上去再 rm -rf，当然会删除，这个我清楚。我理解你的意思了，可能当时我的 rbd 在某步已经 umount 了。多谢指教！

想请假下，你有做过多节点挂载存储么？

Nightmare漆黑之梦 · Answer 8 · Sun Jul 22 2018 06:20:33 GMT+0800 (China Standard Time)

做了，不过用的不是rbd，用的是cephfs

mattshma · Answer 9 · Mon Jul 23 2018 13:52:54 GMT+0800 (China Standard Time)

哦，这样。

Xigang Wang · Answer 10 · Mon Dec 31 2018 12:43:26 GMT+0800 (China Standard Time)

@mattshma 最好不要这么干，因为如果有pvc的话，rm -rf pod会顺便吧pvc里面的内容干掉，回收pod不是这么干的。

赞同你的观点：）