Multi-node: GuaranteedUpdate of /registry/minions/<NODE> failed because of a conflict
ashwinp opened this issue · comments
Issue Details:
- Worker nodes fail to update the node status.
kubectl get nodes
on the master does not list one or more worker nodes.- Issue can be reproduced intermittently.
- Restarting kubelet and/or kube-apiserver does not help.
- This isn't a transient failure. The worker nodes are never able to update the status. They never show up in
kubectl get nodes
.
Setup details:
- 3 worker nodes, 1 master node, 1 etcd node
- All nodes run CoreOS-stable-1409.6.0-hvm (ami-00110279)
- Issue can be reproduced with Kubernetes 1.6.4 as well as 1.7.0.
- Issue can be reproduced with etcd 3.5.4 as well as 2.7.* (older version).
kubelet on the worker nodes fails to update the worker node status after claiming to have registered successfully:
kubelet-wrapper[1657]: I0804 16:42:15.216223 1657 kubelet_node_status.go:77] Attempting to register node 172.0.60.57
kubelet-wrapper[1657]: I0804 16:42:15.218882 1657 kubelet_node_status.go:80] Successfully registered node 172.0.60.57
kubelet-wrapper[1657]: E0804 16:42:25.230766 1657 kubelet_node_status.go:326] Error updating node status, will retry: error getting node "172.0.60.57": nodes "172.0.60.57" not found
kubelet-wrapper[1657]: E0804 16:42:25.232449 1657 kubelet_node_status.go:326] Error updating node status, will retry: error getting node "172.0.60.57": nodes "172.0.60.57" not found
Looking at the Kubernetes API server logs reveals the fact that there is a conflict while updating the node in etcd, due to which the API server deletes the node:
I0804 16:42:15.220414 1 wrap.go:75] GET /api/v1/nodes/172.0.60.57: (736.057µs) 200
[[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:47534]
I0804 16:42:15.227137 1 store.go:329] GuaranteedUpdate of /registry/minions/172.0.60.57 failed because of a conflict, going to retry
I0804 16:42:15.227245 1 store.go:329] GuaranteedUpdate of /registry/minions/172.0.60.57 failed because of a conflict, going to retry
I0804 16:42:15.227280 1 wrap.go:75] GET /api/v1/pods?fieldSelector=spec.nodeName%3D172.0.60.57: (7.793419ms) 200 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:46858]
I0804 16:42:15.227314 1 wrap.go:75] PUT /api/v1/nodes/172.0.60.57: (6.490089ms) 409 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:47534]
I0804 16:42:15.227250 1 wrap.go:75] PATCH /api/v1/nodes/172.0.60.57: (6.805385ms) 200 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/ttl-controller] 127.0.0.1:47536]
I0804 16:42:15.228557 1 wrap.go:75] GET /api/v1/nodes/172.0.60.57: (708.958µs) 200 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:46858]
I0804 16:42:15.228820 1 wrap.go:75] PATCH /api/v1/namespaces/default/events/172.0.60.57.14d7b23385905550: (11.479188ms) 200 [[kubelet/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd] 172.0.60.57:59454]
I0804 16:42:15.228837 1 wrap.go:75] PATCH /api/v1/nodes/172.0.60.57/status: (6.707276ms) 200 [[kubelet/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd] 172.0.60.57:59454]
I0804 16:42:15.229323 1 wrap.go:75] PUT /api/v1/nodes/172.0.60.57: (406.754µs) 409 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:47536]
I0804 16:42:15.230566 1 wrap.go:75] GET /api/v1/nodes/172.0.60.57: (719.769µs) 200 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:47536]
I0804 16:42:15.232358 1 wrap.go:75] PUT /api/v1/nodes/172.0.60.57: (1.469816ms) 200 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:47536]
I0804 16:42:15.232840 1 wrap.go:75] PATCH /api/v1/namespaces/default/events/172.0.60.57.14d7b2338590686a: (3.188002ms) 200 [[kubelet/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd] 172.0.60.57:59454]
I0804 16:42:15.235985 1 wrap.go:75] PATCH /api/v1/namespaces/default/events/172.0.60.57.14d7b23385907c23: (2.451278ms) 200 [[kubelet/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd] 172.0.60.57:59454]
I0804 16:42:17.732567 1 wrap.go:75] DELETE /api/v1/nodes/172.0.60.57: (2.582459ms) 200 [[hyperkube/v1.6.4+coreos.0 (linux/amd64) kubernetes/8996efd/node-controller] 127.0.0.1:47534]