aws-eks-nodeagent container logs errors on startup and shutdown
rtomadpg opened this issue · comments
What happened:
After upgrading VPC-CNI from v1.14.1-eksbuild.1
to v1.15.4-eksbuild.1
all the aws-eks-nodeagent
containers logged:
aws-node-np4cq aws-eks-nodeagent 2023-12-06 16:14:59.823264484 +0000 UTC Logger.check error: failed to get caller
And, when I delete a random aws-node pod, I see this:
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131300614 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131410269 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131480895 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131594396 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131647113 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131669285 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.131694685 +0000 UTC Logger.check error: failed to get caller
aws-node-sdp94 aws-eks-nodeagent 2023-12-06 16:25:56.13179858 +0000 UTC Logger.check error: failed to get caller
I believe these errors comes from the uber-go/zap
dependency, see https://github.com/uber-go/zap/blob/5acd569b6a5264d4c7433cbb278a8336d491715c/logger.go#L398
As I am unsure this error is signalling something is (really) wrong and this error was not logged in this project yet, I created the bug.
Attach logs
Let me know if needed.
What you expected to happen:
No errors getting logged.
How to reproduce it (as minimally and precisely as possible):
- Upgrade to the mentioned version
- Check the aws-node pod logs
- Or, delete a aws-node pod. New pod will log the errors.
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
): v1.27.7-eks-4f4795d - CNI Version: v1.15.4-eksbuild.1
- OS (e.g:
cat /etc/os-release
): Amazon Linux 2 - Kernel (e.g.
uname -a
):
Linux <hostname redacted> 5.10.192-183.736.amzn2.x86_64 aws/amazon-vpc-cni-k8s#1 SMP Wed Sep 6 21:15:41 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
@rtomadpg just curious, did you notice the comment with:
For Network Policy issues, please file at https://github.com/aws/aws-network-policy-agent/issues
when you opened this issue? We are trying to improve the experience here with triaging Network Policy agent issues, so I am wondering if you think there is a better way this could have been noticed.
As for this issue, this is the same as #103. This error log is harmless, and a fix is in progress
Ouch, so sorry! I checked the new bug flow and indeed that comment is there. Very clearly.
I guess I was too eager to file the bug (end of work day here) and I overlooked that part.
@jdn5126 maybe a suggestion: when errors are logged by a container named "aws-eks-nodeagent" it's not immediately clear that's related to "Network Policy issues" or "aws-network-policy-agent". Maybe a mention of "aws-eks-nodeagent" in that comment will reduce wrongly filed issues?
Ouch, so sorry! I checked the new bug flow and indeed that comment is there. Very clearly. I guess I was too eager to file the bug (end of work day here) and I overlooked that part.
Oh no worries, I was just curious if there was a better setup through GitHub. Good call, I can expand the comment
Hi everyone, sorry jumpin in on a closed thread.
I'm facing the same issue, but without the network policy error mentioned here.
I'm tryint to upgrade a managed worker group to 1.25 but the aws-node daemonset keeps failing in aws-eks-nodeagent container, causing the pod to restart
Any ideas ?
The VPC CNI plugin version is on v1.15.1-eksbuild.1
@lsabreu96 the error log from this issue is harmless. If you are seeing the aws-eks-nodeagent
container crashing, please file a new issue with the logs from the crash, which you can find in /var/log/aws-routed-eni/network-policy-agent.log
on the affected node.
For anyone reaching this thread because the aws-eks-nodeagent
container is crashing with UTC Logger.check error: failed to get caller
: For me the issue was mixing EKS k8s version 1.24 with aws-network-policy-agent:v1.0.4-eksbuild.1
and amazon-k8s-cni:v1.15.1-eksbuild.1
(These versions were automatically provisioned by EKS). Upgrading to k8s version 1.25 fixes the container crashing loop, as mentioned on the README of this repo (You’ll need a Kubernetes cluster version 1.25+ to run against.
).
So not commenting to reopen this issue, just provide information if anyone still running 1.24 lands here!