aws / aws-network-policy-agent

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Network policy blocks established connections to RDS

Mohilpalav opened this issue · comments

What happened:

We have a workload running in an EKS cluster which makes a request to an RDS cluster on startup. This request is blocked by the Network policy despite having an egress rule to the RDS cluster subnet from that workload. We suspect that the outbound connection goes through before the network policy node agent starts tracking the connections, and when the response is received the node agent doesn't have the known allowed connection to match due to which the traffic gets denied.

This is what we can see in the network policy flow logs:

Node: ip-10-51-21-121.us-east-1.compute.internal;SIP: 10.47.53.151;SPORT: 5432;DIP: 10.27.36.181;DPORT: 45182;PROTOCOL: TCP;PolicyVerdict: DENY
Node: ip-10-51-21-121.us-east-1.compute.internal;SIP: 10.47.53.151;SPORT: 5432;DIP: 10.27.36.181;DPORT: 45174;PROTOCOL: TCP;PolicyVerdict: DENY

10.47.53.151:5432-> RDS
10.27.36.181 -> EKS workload

Unfortunately, the node agent logs only show this at the moment #103:

2024-03-19 21:31:19.049604118 +0000 UTC Logger.check error: failed to get caller
2024-03-19 21:31:19.858783024 +0000 UTC Logger.check error: failed to get caller
2024-03-19 21:31:19.923276681 +0000 UTC Logger.check error: failed to get caller

What you expected to happen:
The connection to RDS should be allowed.

How to reproduce it (as minimally and precisely as possible):

  • create a network policy that allows all egress, but no ingress traffic for a simple application
  • application, on startup, makes outbound connections (several) to some external service (eg. example.com)
  • deploy the application as a multi-replica deployment making this behavior more consistent
  • review to see if any return traffic / responses are denied by network policy agent when they should not be

Anything else we need to know?:
Similar issues:
#73
#186

Environment:

  • Kubernetes version (use kubectl version): v1.28
  • CNI Version: v1.16.4
  • Network Policy Agent Version: v1.0.8
  • OS (e.g: cat /etc/os-release): Amazon Linux 2
  • Kernel (e.g. uname -a): 5.10.210-201.852.amzn2.x86_64

Here the pod attempted to start a connection before NP enforcement and hence response packet is dropped. Pl refer to this #189 (comment) for detailed explanation.

Our recommended solution for this is Strict mode, which will gate pod launch until policies are configured against the newly launched pod - https://github.com/aws/amazon-vpc-cni-k8s?tab=readme-ov-file#network_policy_enforcing_mode-v1171

Other option if you don't want to enable this mode is to allow Service CIDRs given that your pods communicate via Service vips and this will allow return traffic..

@Mohilpalav Did Strict mode help with your use case/issue?

@Mohilpalav Is there any solution for this issue?

Hello there,

we have the same problem, connecting to RDS service from a pod, but also when contacting the S3 service.
We try to reproduce the error, but it is not something predictable. We have some errors when we try to deploy a lot of pods at the same time that try to connect to the RDS or S3 service, but it is not always the case.

Did you find any solution to this problem?