Getting Failed to watch of *v1alpha1.PolicyEndpoint ended with: an error on the server after upgrading VPC CNI to v1.17.1+ version with aws-network-policy-agent v1.1.0
ArtemProskochylo opened this issue · comments
What happened:
After upgrading vpc-cni plugin to v1.17.1 and v1.18.0 versions I see a lot of errors for the aws-network-policy-agent container with v1.1.0 version. The issue is occurring even on fresh EKS installations where we are not using Network Policies.
Attach logs
W0424 08:27:34.397257 1 reflector.go:462] pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/reflector.go:229: watch of *v1alpha1.PolicyEndpoint ended with: an error on the server ("unable to decode an event from the watch stream: context canceled") has prevented the request from succeeding
What you expected to happen:
No error messages.
How to reproduce it (as minimally and precisely as possible):
- Deploy v1.29 EKS cluster
- Deploy VPC CNI Add-on v1.17.1-eksbuild.1 or v1.18.0-eksbuild.1 version.
- Run kubectl -n kube-system logs aws-node-*
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
):
Client Version: v1.29.1
Server Version: v1.29.1-eks-b9c9ed7 - CNI Version: v1.17.1 and v1.18.0
- Network Policy Agent Version: v1.1.0
- OS (e.g:
cat /etc/os-release
): Bottlerocket OS 1.19.2 (aws-k8s-1.29) - Kernel (e.g.
uname -a
): 6.1.77
@ArtemProskochylo How did you upgrade the VPC CNI version? It appears that you're missing the required permissions for the aws-node
pod. Did you apply the corresponding version specific manifest?
Facing the same issue after upgrading to EKS 1.29 with CNI 1.18.0.
@achevuru I upgraded the addon directly from AWS using Terraform. I checked the ClusterRole configuration and it has the permissions you referred to:
- apiGroups:
- networking.k8s.aws
resources: - policyendpoints
verbs: - get
- list
- watch
- networking.k8s.aws
Seems like a bug.
@danielap-ma If you're seeing the same error as above - then either the permissions are missing (please check if CNI pods have correct SA in place) or there are connectivity issues with your API Server. I quickly tried it and I don't see any such issue(s) on my end.
@ArtemProskochylo How did you upgrade the VPC CNI version? It appears that you're missing the required permissions for the
aws-node
pod. Did you apply the corresponding version specific manifest?
Hi @achevuru
Sorry for the late response. It was also updated through Terraform. But in my case only add-on version was set through Terraform, configmaps, daemonset and other resources are managed by AWS. I have checked RBACs for vpc-cni v1.17.1 and required permissions are presented there:
`- apiGroups:
- networking.k8s.aws
resources: - policyendpoints
verbs: - get
- list
- watch
- apiGroups:
- networking.k8s.aws
resources: - policyendpoints/status
verbs: - get`
- networking.k8s.aws
But I still see the following error in logs for v1.17.1:
W0509 03:34:41.481449 1 reflector.go:462] pkg/mod/k8s.io/client-go@v0.29.1/tools/cache/reflector.go:229: watch of *v1alpha1.PolicyEndpoint ended with: an error on the server ("unable to decode an event from the watch stream: http2: client connection lost") has prevented the request from succeeding
In another cluster running the updated version v1.18.1, I do not see those errors. I suppose it is a version-specific issue.
I hope provided info will be useful for you.
Thanks