hyperhq / hypernetes

The multi-tenant Kubernetes distro

Home Page:http://hypernetes.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

UnmountVolume.TearDown failed for volume no such file or directory

resouer opened this issue · comments

This happens randomly, an extra unmount task is created while the volume dir has already been cleaned up by the normal unmount task.

This is caused by a race:

kubelet use a goroutinemap to make sure there's only one unmount task running at any time for one volume. While unmount task is being created continually by reconcile loop, all of them will be considered as invalid if there's already a task running.

But when in the process of creating a unmount task B, it is possible that the running task A finished and exited during that process, so then the newly created unmount task B will be wrongly considered as valid by goroutinemap and run again. That's why B tries to do exactly A's job again and fail.

We can see here a unneeded UnmountVolume operation started:

I1202 08:53:10.498664   11001 reconciler.go:138] UnmountVolume operation started for volume "kubernetes.io/cinder/f8d5222a-ccb1-44ed-9bf9-2449803889f4" (spec.Name: "nginx-persistent-storage") from pod "9c625766-b896-11e6-8160-080027e4bc93" (UID: "9c625766-b896-11e6-8160-080027e4bc93").
I1202 08:53:11.011697   11001 thin_pool_watcher.go:77] thin_ls(1480686788) took 2.810181785s
I1202 08:53:11.200383   11001 generic.go:141] GenericPLEG: 9c625766-b896-11e6-8160-080027e4bc93/3e43a9ce2221883069cee51bd6c49a27c0eff3289b3b6808b022d131fb27f22d: exited -> non-existent
I1202 08:53:11.200404   11001 generic.go:320] PLEG: Delete status for pod "9c625766-b896-11e6-8160-080027e4bc93"
I1202 08:53:11.393958   11001 kubelet.go:2620] SyncLoop (housekeeping)
I1202 08:53:11.395606   11001 kubelet.go:2049] Orphaned pod "9c625766-b896-11e6-8160-080027e4bc93" found, but volumes are not cleaned up; err: <nil>
I1202 08:53:11.776053   11001 cinder_baremetal.go:144] Volume volume is not mounted since rbd is natively supported
I1202 08:53:12.386034   11001 operation_executor.go:792] UnmountVolume.TearDown succeeded for volume "kubernetes.io/cinder/f8d5222a-ccb1-44ed-9bf9-2449803889f4" (volume.spec.Name: "nginx-persistent-storage") pod "9c625766-b896-11e6-8160-080027e4bc93" (UID: "9c625766-b896-11e6-8160-080027e4bc93").
I1202 08:53:12.438076   11001 reconciler.go:138] UnmountVolume operation started for volume "kubernetes.io/cinder/f8d5222a-ccb1-44ed-9bf9-2449803889f4" (spec.Name: "nginx-persistent-storage") from pod "9c625766-b896-11e6-8160-080027e4bc93" (UID: "9c625766-b896-11e6-8160-080027e4bc93").
I1202 08:53:12.438167   11001 cinder.go:424] IsLikelyNotMountPoint check failed: stat /var/lib/kubelet/pods/9c625766-b896-11e6-8160-080027e4bc93/volumes/kubernetes.io~cinder/nginx-persistent-storage: no such file or directory
E1202 08:53:12.438191   11001 goroutinemap.go:155] Operation for "kubernetes.io/cinder/f8d5222a-ccb1-44ed-9bf9-2449803889f4" failed. No retries permitted until 2016-12-02 08:53:12.938184214 -0500 EST (durationBeforeRetry 500ms). error: UnmountVolume.TearDown failed for

dup of #154, closed 154 in favor of this one.

Wrongly closed this issue. #156 is not actually solving this issue.