Is the network policy for VPC CNI designed to be stateful or stateless?
khayong opened this issue · comments
What happened:
I have created an egress network policy allowing the web pod to establish connections with the backend server pod at port 4000.
podSelector:
matchLabels:
app.kubernetes.io/component: web
egress:
- ports:
- protocol: TCP
port: 4000
to:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: backend
podSelector:
matchLabels:
app.kubernetes.io/name: backend
While initially operating as intended, after some time, the packet log occasionally registers a DENY entry for certain return traffic.
Node: ip-10-0-64-172.ap-southeast-1.compute.internal;SIP: 10.0.68.172;SPORT: 4000;DIP: 10.0.74.123;DPORT: 39816;PROTOCOL: TCP;PolicyVerdict: DENY
where 10.0.68.172 is the backend server, 10.0.74.123 is the web server.
To mitigate this issue, I have to define an ephemeral port range for the ingress of the returned traffic, similar to the VPC ACL configuration.
podSelector:
matchLabels:
app.kubernetes.io/component: web
ingress:
- ports:
- protocol: TCP
port: 1024
endPort: 65535
from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: backend
podSelector:
matchLabels:
app.kubernetes.io/name: backend
Attach logs
What you expected to happen:
Kubernetes Network Policies are stateful, which means there's often no need to explicitly define rules for return traffic?
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
- Kubernetes version (use
kubectl version
):Server Version: v1.28.4-eks-8cb36c9
- CNI Version:
v1.16.0-eksbuild.1
- OS (e.g:
cat /etc/os-release
):
NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"
- Kernel (e.g.
uname -a
):Linux ip-10-0-64-172.ap-southeast-1.compute.internal 5.10.199-190.747.amzn2.x86_64 aws/amazon-vpc-cni-k8s#1 SMP Sat Nov 4 16:55:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Moving to Network Policy agent repo
@khayong the network policy implementation is stateless. What does the policy endpoint object show for this policy? You can see the output with kubectl get policyendpoint <policy_name>
Yes you are right, there is no need to explicitly define rules for return traffic. Can you check the number of the entries in the network policy agent's conntrack table when the issue starts to happen? When the issue happens is there any pod churn or just the established connections fail after a while?
Steps to check -
- SSH to the node where you are seeing deny logs, then
cd /opt/cni/bin/
- Dump the maps -
./aws-eks-na-cli ebpf maps
- Pick the ID which has
Keysize 20 Valuesize 1 MaxEntries 65536
For example here ID is 5 ->
./aws-eks-na-cli ebpf maps
Maps currently loaded :
Type : 2 ID : 3
Keysize 4 Valuesize 98 MaxEntries 1
========================================================================================
Type : 9 ID : 5
Keysize 20 Valuesize 1 MaxEntries 65536
========================================================================================
Type : 27 ID : 6
Keysize 0 Valuesize 0 MaxEntries 262144
========================================================================================
Type : 11 ID : 16
Keysize 8 Valuesize 288 MaxEntries 65536
========================================================================================
- Then using the ID, we should be able to get the number of entries using this CLI ->
./aws-eks-na-cli ebpf dump-maps 5
(Note: replace 5 with the ID you got from step 3.)
I have also encountered this issue, and it seems to relate to long-lived connections being removed from the conntrack table prematurely. There are other issues in this repository relating to this and the latest version (CNI v1.16.0-eksbuild.1
/ policy agent 1.0.7
) does not fix the issue.
If you enable policy logging using the below configuration on the VPC CNI (if deployed through the UI, else use the appropriate args in Helm/CLI), you'll see that there's an ACCEPT
for the connection, then sometime later it's removed from the conntrack table, followed by a DENY
in your logs.
{
"enableNetworkPolicy":"true",
"nodeAgent": {
"enableCloudWatchLogs": "true",
"enablePolicyEventLogs": "true"
}
}
Do you have multiple replicas of these pods scheduled on the same node?
I think this could very well be the case, as we bin pack on a small number of nodes to keep cost low. This would explain why we did not witness this issue in an our development environment which does not have more than 1 replica per deployment.
We will have a release candidate image soon if you are willing to try it out to see if it resolves the issue. The official release image containing #179 is targeting mid-January.
Thanks jayanthvn, it works. With v1.0.8-rc1
, there's no need for me to explicitly define rules for return traffic.
I observed some denied connections in the log today. It appears that there might be a delay in creating entries in the conntrack table. The initial two logs indicate a denial status due to the conntrack not being updated? However, after a delay of 3 seconds, the third log reflects an allowance, seems like the conntrack entry has been successfully created at that time.
On the conntrack table, I can see the presence of the corresponding entry.
Is it considered normal for there to be a delay in the creation of conntrack entries?
I observed some denied connections in the log today.
I have observed the same behaviour. This is with a single pod in a replicaset so unrelated to the race condition I think.
@khayong - There will be few seconds(1-2s) delay for the controller to reconcile and attach probes to the new pods. Traffic will be allowed until the probes are attached and then the policy enforcement will take into effect based on the config..in this case probe was probably missing when ingress traffic came in and so no conntrack entry was created.
Regarding the 2nd issue, do you have active policy on .54 pod? if yes can you share the PE?
Regarding the 2nd issue, do you have active policy on .54 pod? if yes can you share the PE?
yes, here it is
apiVersion: networking.k8s.aws/v1alpha1
kind: PolicyEndpoint
metadata:
creationTimestamp: "2024-01-11T16:17:37Z"
generateName: live2-gateway-
generation: 1
name: live2-gateway-855lp
namespace: live2
ownerReferences:
- apiVersion: networking.k8s.io/v1
blockOwnerDeletion: true
controller: true
kind: NetworkPolicy
name: live2-gateway
uid: e2fec936-f1d0-4f9a-bd8c-07d5967ba9e8
resourceVersion: "24471548"
uid: 2734c483-dc7b-412f-983b-6f2d2b2ca463
spec:
egress:
- cidr: 0.0.0.0/0
ports:
- port: 53
protocol: UDP
- cidr: ::/0
ports:
- port: 53
protocol: UDP
ingress:
- cidr: 10.0.64.172
ports:
- port: 8080
protocol: TCP
- port: 8080
protocol: TCP
- cidr: 10.0.78.236
ports:
- port: 8080
protocol: TCP
- port: 8080
protocol: TCP
podIsolation:
- Ingress
- Egress
podSelector:
matchLabels:
app.kubernetes.io/instance: live2
app.kubernetes.io/name: gateway
podSelectorEndpoints:
- hostIP: 10.0.54.248
name: live2-gateway-b575dcf44-w6sfc
namespace: live2
podIP: 10.0.59.54
- hostIP: 10.0.54.248
name: live2-gateway-b575dcf44-ktrzz
namespace: live2
podIP: 10.0.60.190
policyRef:
name: live2-gateway
namespace: live2
@khayong we are unable to repro this. Can we get on a call? Are you on Kubernetes channel is so we can connect in #aws-vpc-cni .
Can you please try with the latest v1.0.8 release? - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.16.3
Closing as v1.0.8
has been released. Please reopen if your issue is not resolved.