aws / aws-network-policy-agent

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Flaky network policy enforcement, especially around the kubeapi

corang opened this issue · comments

What happened:
In an EKS cluster with many default deny network policies and specific allowed IPs/NSs vpc-cni/aws-network-policy-agent seems to "forget" about some network policies for some pods. This seems to mostly happen for the kubeapi.

What you expected to happen:
Network Policy enforcement is consistent and reliable.

How to reproduce it (as minimally and precisely as possible):
Deploy resources to the cluster in many namespaces that require access to the kubeapi. In each namespace create a default deny network policy and an allow kubeapi policy. Eventually pods will not be able to talk to the kubeapi (ssl connect timeout). This can take anywhere from hours to weeks.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.29.1-eks-b9c9ed7
  • CNI Version: v1.18.0-eksbuild.1
  • Network Policy Agent Version: v1.1.0-eksbuild.1
  • OS (e.g: cat /etc/os-release):
    OS Info NAME=Bottlerocket ID=bottlerocket VERSION="1.19.4 (aws-k8s-1.29)" PRETTY_NAME="Bottlerocket OS 1.19.4 (aws-k8s-1.29)" VARIANT_ID=aws-k8s-1.29 VERSION_ID=1.19.4 BUILD_ID=4f0a078e HOME_URL="https://github.com/bottlerocket-os/bottlerocket" SUPPORT_URL="https://github.com/bottlerocket-os/bottlerocket/discussions" BUG_REPORT_URL="https://github.com/bottlerocket-os/bottlerocket/issues" DOCUMENTATION_URL="https://bottlerocket.dev"
  • Kernel (e.g. uname -a): Linux ip-10-200-20-133.us-gov-west-1.compute.internal 6.1.82 #1 SMP PREEMPT_DYNAMIC Fri Apr 5 22:26:33 UTC 2024 x86_64 GNU/Linux
commented

Are there any updates on this ticket? I am also encountering this issue, experiencing sporadic denied traffic in the network agent log for Kubernetes API and CoreDNS, although initial connections appear to work when tested manually. I'm unable to implement network policies on our production cluster while this issue remains unresolved. Thank you!

This looks similar to #204. We have merged the fix to release branch and currently going through the release pipeline. We should have the release by this week.

Fix is released with network policy agent v1.1.2. - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.18.2. Please test and let us know if there are any issues.