[BUG] node lifecycle controller in yurt-manager can not update status of node
crazytaxii opened this issue · comments
What happened:
Node always stays with Ready
status after stopping kubelet on it, even shutting down the node itself.
The bug causes the Pods can not be migrated to other nodes.
What you expected to happen:
The abnormal node should be updated into NotReady
status.
How to reproduce it (as minimally and precisely as possible):
Stopping the kubelet on a node.
Anything else we need to know?:
Error log in yurt-manager's node lifecycle controller:
E0126 07:43:15.444074 1 node_lifecycle_controller.go:975] "Error updating node" err="nodes \"edge\" is forbidden: User \"system:serviceaccount:kube-system:yurt-manager\" cannot update resource \"nodes/status\" in API group \"\" at the cluster scope" node="edge"
E0126 07:43:15.452574 1 node_lifecycle_controller.go:715] "Update health of Node from Controller error, Skipping - no pods will be evicted" err="timed out waiting for the condition" node="edge"
nodes/status is a subresource, it should be added to the ClusterRole of yurt-manager also.
Environment:
- OpenYurt version: v1.4
- Kubernetes version (use
kubectl version
): v1.27.2
/kind bug
@crazytaxii Thanks for raising issue.
It seems that rbac settings of nodelifecycle had been missed. would you like to make a pull request to fix it?
/assign @crazytaxii
The entire system:controller:node-controller ClusterRole for kube-controller-manager in Kubernetes cluster v1.27.2 is:
# ...
rules:
- apiGroups:
- ""
resources:
- nodes
verbs:
- delete
- get
- list
- patch
- update
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
- update
- apiGroups:
- ""
resources:
- pods/status
verbs:
- patch
- update
- apiGroups:
- ""
resources:
- pods
verbs:
- delete
- list
- apiGroups:
- networking.k8s.io
resources:
- clustercidrs
verbs:
- create
- get
- list
- update
- apiGroups:
- ""
- events.k8s.io
resources:
- events
verbs:
- create
- patch
- update
- apiGroups:
- ""
resources:
- pods
verbs:
- get
Compare to the ClusterRole of yurt-manager(v1.4):
# ...
- apiGroups:
- ""
resources:
- nodes
verbs:
# - delete # missing one
- get
- list
- patch
- update
- watch # extra one
# - apiGroups: # missing one
# - ""
# resources:
# - nodes/status
# verbs:
# - patch
# - update
- apiGroups:
- ""
resources:
- pods
verbs:
- create # extra one
- delete
- get
- list
- patch # extra one
- update # extra one
- watch # extra one
- apiGroups:
- ""
resources:
- pods/status
verbs:
# - patch # missing one
- update
# - apiGroups: # missing one
# - networking.k8s.io
# resources:
# - clustercidrs
# verbs:
# - create
# - get
# - list
# - update
# - apiGroups: # missing one
# - ""
# - events.k8s.io
# resources:
# - events
# verbs:
# - create
# - patch
# - update
# ...
But the node lifecycle controller in yurt-manager differs a lot from the one in kube-controller-manager v1.27.2 definitely.
clustercidrs
@crazytaxii Except networking.k8s.io/clustercidrs
resource, other missed rbac settings should be added to yurt-manager.
because networking.k8s.io/clustercidrs
is used by node ipam controller
in kube-controller-manager, and it is not needed by nodelifecycle
controller.