aws / aws-network-policy-agent

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pod communication issues when applying network policies on AWS VPC-CNI

AlexanderPavlovHC opened this issue · comments

What happened:
After switching from Calico to native AWS VPC-CNI and adding a network policy to restrict access, some of the new pods that are created to run jobs are encountering a Crashloopbackoff error and are unable to complete their tasks. No such errors were observed when using Calico.

Details:

  • The policy allows access for haproxy, prometheus, elasticsearch, and intra-namespace pod communication.
  • Pod communication issues arose after switching from Calico to AWS VPC-CNI.
  • Network policy manifest:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-policies
  namespace: default
spec:
  podSelector: {}
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: elastic-system
          podSelector:
            matchLabels:
              app.kubernetes.io/instance: eck-operator
      ports:
        - port: 9200
        - port: 9300
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: haproxy-ingress
          podSelector:
            matchLabels:
              app.kubernetes.io/name: haproxy-ingress
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: prometheus
          podSelector:
            matchLabels:
              app.kubernetes.io/name: prometheus
      ports:
        - port: 8080
        - port: 9090
        - port: 9091
        - port: 9253
        - port: 15692
    - from:
        - podSelector: {} 

Network policy diagram:

Screenshot 2023-11-03 at 15 46 47

Logs from network-policy-agent.log:

{"level":"info","ts":"2023-11-02T13:10:35.224Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:149","msg":"Target Pod doesn't belong to the current pod Identifier: ","Name: ":"ledger-cron-15-28315500-fl5np","Pod ID: ":"ledger-cron-15-28315510-default"}
{"level":"info","ts":"2023-11-02T13:10:35.224Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:149","msg":"Processing Pod: ","name:":"docx-api-5c84697988-l8f49","namespace:":"default","podIdentifier: ":"ledger-cron-15-28315510-default"}
{"level":"info","ts":"2023-11-02T13:10:35.224Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:149","msg":"Target Pod doesn't belong to the current pod Identifier: ","Name: ":"docx-api-5c84697988-l8f49","Pod ID: ":"ledger-cron-15-28315510-default"}
{"level":"info","ts":"2023-11-02T13:10:35.224Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:149","msg":"Processing Pod: ","name:":"qmt-poland-motor-generali-worker-77d67fb98-zwp2w","namespace:":"default","podIdentifier: ":"ledger-cron-15-28315510-default"}
{"level":"info","ts":"2023-11-02T13:10:35.224Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:149","msg":"Target Pod doesn't belong to the current pod Identifier: ","Name: ":"qmt-poland-motor-generali-worker-77d67fb98-zwp2w","Pod ID: ":"ledger-cron-15-28315510-default"}
{"level":"info","ts":"2023-11-02T13:10:35.224Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:149","msg":"Processing Pod: ","name:":"ledger-cron-15-28315490-9nbrf","namespace:":"default","podIdentifier: ":"ledger-cron-15-28315510-default"}

Environment:

  • Kubernetes: 1.25
  • Kubelet: v1.25.12-eks-8ccc7ba
  • VPC-CNI add-on: v1.15.1-eksbuild.1
  • AMI release: 1.25.12-20230825
  • Kernel: 5.10.186-179.751.amzn2.x86_64

@AlexanderPavlovHC - Did you capture access logs to confirm if the traffic is getting denied? To capture access logs, you will have to enable this flag - enable-policy-events-logs. Since you mentioned only few pods are getting impacted can you share what is the pod scale in default NS? if the scale is high most probably you are running into this issue - #106. This is fixed with v1.0.5 network policy agent which is part of v1.15.3 release - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.15.3

Thank you for the guidance. After updating the addon version and enabling policy-events-logs, I see the following on the nodes where the CrashLoopBackOff error with the pod occurs:

Pod address 172.20.121.22

          root@ip-172-20-100-147 /]# cat /var/log/aws-routed-eni/network-policy-agent.log | grep 172.20.121.22
             { "level": "info","ts": "2023-11-07T15:40:38.404Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:40:39.805Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:40:39.805Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:40:41.003Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:40:41.003Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:09.105Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:09.105Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:10.304Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:10.304Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:10.303Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "172.20.100.147","Src Port": 42190,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
             { "level": "info","ts": "2023-11-07T15:45:10.403Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
             { "level": "info","ts": "2023-11-07T15:45:10.403Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
             { "level": "info","ts": "2023-11-07T15:45:10.403Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
             { "level": "info","ts": "2023-11-07T15:45:10.403Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
             { "level": "info","ts": "2023-11-07T15:45:10.703Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
             { "level": "info","ts": "2023-11-07T15:45:11.603Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
             { "level": "info","ts": "2023-11-07T15:45:11.604Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:11.604Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:13.203Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "172.20.100.147","Src Port": 42206,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
             { "level": "info","ts": "2023-11-07T15:45:13.403Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
             { "level": "info","ts": "2023-11-07T15:45:13.604Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:13.604Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:14.405Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:14.405Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:15.603Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:15.603Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:16.003Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "172.20.100.147","Src Port": 42216,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
             { "level": "info","ts": "2023-11-07T15:45:16.503Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
             { "level": "info","ts": "2023-11-07T15:45:16.803Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:16.803Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:17.905Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:17.905Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:19.003Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "172.20.100.147","Src Port": 52162,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
             { "level": "info","ts": "2023-11-07T15:45:19.304Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:19.304Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:22.003Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "172.20.100.147","Src Port": 52170,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
             { "level": "info","ts": "2023-11-07T15:45:23.146Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
             { "level": "info","ts": "2023-11-07T15:45:24.981Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "172.20.100.147","Src Port": 52184,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
             { "level": "info","ts": "2023-11-07T15:45:27.981Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "172.20.100.147","Src Port": 46384,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
             { "level": "info","ts": "2023-11-07T15:45:29.406Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:29.406Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:30.705Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:30.705Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:30.981Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "172.20.100.147","Src Port": 46400,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
             { "level": "info","ts": "2023-11-07T15:45:32.005Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:32.005Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:33.106Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:33.106Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:33.403Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "172.20.121.22","Src Port": 55464,"Dest IP": "10.100.0.1","Dest Port": 443,"Proto": "TCP","Verdict": "ACCEPT" }
             { "level": "info","ts": "2023-11-07T15:45:34.003Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "172.20.100.147","Src Port": 46414,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
             { "level": "info","ts": "2023-11-07T15:45:34.194Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "172.20.121.22","Src Port": 60082,"Dest IP": "10.100.0.1","Dest Port": 443,"Proto": "TCP","Verdict": "ACCEPT" }
             { "level": "info","ts": "2023-11-07T15:45:34.406Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:34.406Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:35.306Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:35.306Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:37.204Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:37.204Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:37.905Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:37.905Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:39.105Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:45:39.105Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
             { "level": "info","ts": "2023-11-07T15:46:16.501Z","logger": "ebpf-client","msg": "Flow Info:  ","Src IP": "172.20.121.22","Src Port": 55992,"Dest IP": "10.100.0.1","Dest Port": 443,"Proto": "TCP","Verdict": "ACCEPT" }

@AlexanderPavlovHC So, you're running in to the same issue with 1.15.3? If yes, do the logs above show any DENY flows that you expect to be allowed? Since we don't know the pod IPs, will be tough for us to map them.

(or) can you collect node logs via /opt/cni/bin/aws-cni-support.sh and mail them to k8s-awscni-triage@amazon.com along with the describe o/p of policyEndpoint resources and configured Network Policies.

v1.0.8 release is available - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.16.3. Please try it out and let us know if you see any issues..

Quick update to the above. If you come across this issue, like we did try upgrading to v1.16.4 instead of v1.16.3. v1.16.3 has a bug that will eat 100% of the CPU.