Response traffic from allowed egress denied on short lived pods

Question

Response traffic from allowed egress denied on short lived pods

luk2038649 opened this issue 7 months ago · comments

What happened:
Recently switched from using calico tigera operator for networkpolicies, to enabling networkpolicy handling by aws-vpc-cni.

We are finding intermittent errors with connections hanging for applications which are short lived, and reach out to external services like databases immediately. Noted primarily in cronjobs and airflow pods.

We experienced this same issue reaching out to external google services, and also AWS Aurora instances in a paired VPC.

Our networkpolicy is setup with an explicit Egress allow all. And a more restrictive ingress policy.

spec:
  egress:
    - to:
        - ipBlock:
            cidr: 0.0.0.0/0
  ingress:
    - from:
        - podSelector: {}
    - from:
        - namespaceSelector:
            matchLabels:
              toolkit.fluxcd.io/owner: redacted
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: redacted

Picture of policy:

Its my understanding that we should be allowed to receive response traffic from anywhere based on the documentation here

Example Pod Logs:
Requests that are normally completed quickly hang and never finish.

$ kubectl -n redactedNS logs redactedTask
INFO       2024-01-23 16:44:26 redacted.db read_from_views                      298 : Querying view redacted_data for 2024-01-23 16:57:04

Note that if you exec into this pod and run the same command some minutes startup, it will complete. It only fails to complete right after startup.

Attach logs
VPC-CNI logs.
Instance of return traffic from a google service being denied

"level":"info","ts":"2024-01-23T15:49:26.013Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"172.253.63.95","Src Port":443,"Dest IP":"x.x.x.x(peered vpc IP)","Dest Port":59486,"Proto":"TCP","Verdict":"DENY"}

instance of return traffic from an aurora instance in a peered VPC being denied.

{"level":"info","ts":"2024-01-23T17:50:34.327Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"x.x.x.x(peered VPC IP)","Src Port":5432,"Dest IP":"x.x.x.x(Pod IP),"Dest Port":36806,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-01-23T17:52:37.207Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"x.x.x.x(peered VPC IP)","Src Port":5432,"Dest IP":"x.x.x.x(Pod IP)","Dest Port":36806,"Proto":"TCP","Verdict":"DENY"}
{"level":"info","ts":"2024-01-23T17:54:40.087Z","logger":"ebpf-client","msg":"Flow Info:  ","Src IP":"x.x.x.x(peered VPC IP)","Src Port":5432,"Dest IP":"x.x.x.x(Pod IP),"Dest Port":36806,"Proto":"TCP","Verdict":"DENY"}

What you expected to happen:
Response traffic should not be denied if egress was allowed.

How to reproduce it (as minimally and precisely as possible):

enable all egress in networkpolicy
disable ingress in networkpolicy
create a cronjob which immediately reaches out to an external service.

Anything else we need to know?:
We did not experience this same situation when using calico tigera operator to handle the same networkpolicy.

to be clear, calico has been completely removed and all nodes have been restarted.

Seems to be possibly the same as #83

We have found workarounds by doing two main things.

Explicitly allowing ingress from the CIDR block of the peered VPC where the DB lives.
Sleeping jobs/pods for 5s before making connections.

Environment:

Kubernetes version (use kubectl version): Server Version: version.Info{Major:"1", Minor:"26+", GitVersion:"v1.26.12-eks-5e0fdde", GitCommit:"95c835ee1111774fe5e8b327187034d8136720a0", GitTreeState:"clean", BuildDate:"2024-01-02T20:34:50Z", GoVersion:"go1.20.12", Compiler:"gc", Platform:"linux/amd64"}
CNI Version: v1.16.0-eksbuild.1
Network Policy Agent Version: aws-network-policy-agent:v1.0.7-eksbuild.1
OS (e.g: cat /etc/os-release): NAME="Amazon Linux"
VERSION="2"
ID="amzn"
ID_LIKE="centos rhel fedora"
VERSION_ID="2"
PRETTY_NAME="Amazon Linux 2"
ANSI_COLOR="0;33"
CPE_NAME="cpe:2.3:o:amazon:amazon_linux:2"
HOME_URL="https://amazonlinux.com/"
SUPPORT_END="2025-06-30"
Kernel (e.g. uname -a): Linux ip-x-x-x-x.ec2.internal 5.10.201-191.748.amzn2.x86_64 #1 SMP Mon Nov 27 18:28:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

brettstewart commented 7 months ago

+1

Apurup Chevuru · Answer 1 · Wed Jan 24 2024 15:57:39 GMT+0800 (China Standard Time)

@luk2038649 Based on the issue description, it is expected behavior in a way. Right now, all traffic will be allowed to/from a Pod until the newly launched pods are reconciled against configured network policies on the cluster. It can take up to few seconds for the reconciliation to complete and policies are enforced against a new pod.

We track reverse flows (response traffic) via our own internal conntrack. In the above example, when we initiate a connection to AWS Aurora, the traffic should be allowed (as egress is configured to be allow-all) and once the probe allows the traffic it creates a conntrack entry which the ingress probe will then rely on to allow the return traffic. However, I think the egress connection in this case happens right after pod startup and before the policies are enforced (i.e.,) before the relevant eBPF probes are attached and so the required conntrack entry is not created. But the probes are attached with relevant rules before the return traffic arrives at the pod - so we will not have a match in the conntrack entry and the ingress rules in the configured policy do not allow traffic from this endpoint resulting in a drop. Explains why introducing a few seconds delay resolved the issue (explicitly adding the rules under ingress section will also work in the above race condition). To get around this we might need a few seconds delay at pod startup before it initiates a connection (or) a retry for a failed connection should help as well.

We plan to introduce a Strict mode option in the near future which will gate pod launch until either relevant policies are configured against a new pod replica (or) block all ingress/egress connections until the policies are reconciled against a new pod.

Luke Schulz · Answer 2 · Wed Jan 24 2024 23:29:36 GMT+0800 (China Standard Time)

@achevuru Thanks for the quick response!

Is there any approximate timeline for the general release of the "strict mode" option? Is there an issue or PR we can track?

explicitly adding the rules under ingress section will also work in the above race condition

This is what we have done for known hosts like databases, but we have a lot of applications and opening up traffic for all possible responses is not ideal

Ariary · Answer 3 · Fri Jan 26 2024 18:14:51 GMT+0800 (China Standard Time)

Hi all!
We've noticed same behaviour for pods to pods traffic: an allowed traffic connection (by netpol) is denied by netpol at startup but then allowed (Once the newly launched pods are reconciled against configured network policies on the cluster)

Right now, all traffic will be allowed to/from a Pod until the newly launched pods are reconciled against configured network policies on the cluster. It can take up to few seconds for the reconciliation to complete and policies are enforced against a new pod.

@achevuru I think this is the opposite. All Traffic is denied from a newly created pod until the newly launched pods are reconciled against configured network policies on the cluster

What I was describing is in fact:

Allow rules will be applied eventually after the isolation rules (or may be applied at the same time). In the worst case, a newly created pod may have no network connectivity at all when it is first started, if isolation rules were already applied, but no allow rules were applied yet.

cf netpol doc/pod lifecycle

Apurup Chevuru · Answer 4 · Tue Jan 30 2024 23:52:07 GMT+0800 (China Standard Time)

@luk2038649 We're targeting it for early Q2/late Q1 release time frame. Will update once we're closer to the release.

Sébastien Allamand · Answer 5 · Fri Feb 16 2024 15:52:09 GMT+0800 (China Standard Time)

Can it be possible to delay the readiness of the pod until all the netpol have been correctly applied ? Something like the podreadinessgate used with load balancer integration ?

Ariary · Answer 6 · Tue Apr 02 2024 22:50:07 GMT+0800 (China Standard Time)

@achevuru The Strict mode option has not solved the issue as it seems that what the issue is describing is especially the standard option of the Strict Mode which is (still) blocking some traffic.

Right now, all traffic will be allowed to/from a Pod until the newly launched pods are reconciled against configured network policies on the cluster. It can take up to few seconds for the reconciliation to complete and policies are enforced against a new pod

This statement is not true therefore

Apurup Chevuru · Answer 7 · Tue Apr 02 2024 23:33:59 GMT+0800 (China Standard Time)

@ariary Can you expand on what was not solved with Strict mode? What exactly did you try with Strict mode?

Regarding Standard mode, the above statement is true (i.e.,) the pods will not have any firewall rules enforced until the new pod is reconciled against active policies and so all traffic is allowed. However, once the firewall rules take effect, it will block any return traffic that isn't tracked by the probes. Please refer here. Strict mode should address this.

Ariary · Answer 8 · Thu Apr 04 2024 16:50:16 GMT+0800 (China Standard Time)

@achevuru My issue is more related to other issues, you can ignore my comment