Pod communication issues when applying network policies on AWS VPC-CNI
AlexanderPavlovHC opened this issue · comments
What happened:
After switching from Calico to native AWS VPC-CNI and adding a network policy to restrict access, some of the new pods that are created to run jobs are encountering a Crashloopbackoff error and are unable to complete their tasks. No such errors were observed when using Calico.
Details:
- The policy allows access for haproxy, prometheus, elasticsearch, and intra-namespace pod communication.
- Pod communication issues arose after switching from Calico to AWS VPC-CNI.
- Network policy manifest:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-policies
namespace: default
spec:
podSelector: {}
ingress:
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: elastic-system
podSelector:
matchLabels:
app.kubernetes.io/instance: eck-operator
ports:
- port: 9200
- port: 9300
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: haproxy-ingress
podSelector:
matchLabels:
app.kubernetes.io/name: haproxy-ingress
- from:
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: prometheus
podSelector:
matchLabels:
app.kubernetes.io/name: prometheus
ports:
- port: 8080
- port: 9090
- port: 9091
- port: 9253
- port: 15692
- from:
- podSelector: {}
Network policy diagram:
![Screenshot 2023-11-03 at 15 46 47](https://private-user-images.githubusercontent.com/97234680/280310201-a6e2a1ed-326f-4b4e-9d91-214148b31976.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjE0MDk2NDUsIm5iZiI6MTcyMTQwOTM0NSwicGF0aCI6Ii85NzIzNDY4MC8yODAzMTAyMDEtYTZlMmExZWQtMzI2Zi00YjRlLTlkOTEtMjE0MTQ4YjMxOTc2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA3MTklMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNzE5VDE3MTU0NVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTJlMGMyYzc5NzYxZjNkNDcyNDUxMDhmM2IyYzZiODZhMWNkZGEwMjY2ZGM5ZDNmNjIzMzA2ZjBkNGM4YTI5NTcmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.8yOSax4cZmAEKllMZ4HPd9ifkZNzh_JLto62ceENH20)
Logs from network-policy-agent.log:
{"level":"info","ts":"2023-11-02T13:10:35.224Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:149","msg":"Target Pod doesn't belong to the current pod Identifier: ","Name: ":"ledger-cron-15-28315500-fl5np","Pod ID: ":"ledger-cron-15-28315510-default"}
{"level":"info","ts":"2023-11-02T13:10:35.224Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:149","msg":"Processing Pod: ","name:":"docx-api-5c84697988-l8f49","namespace:":"default","podIdentifier: ":"ledger-cron-15-28315510-default"}
{"level":"info","ts":"2023-11-02T13:10:35.224Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:149","msg":"Target Pod doesn't belong to the current pod Identifier: ","Name: ":"docx-api-5c84697988-l8f49","Pod ID: ":"ledger-cron-15-28315510-default"}
{"level":"info","ts":"2023-11-02T13:10:35.224Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:149","msg":"Processing Pod: ","name:":"qmt-poland-motor-generali-worker-77d67fb98-zwp2w","namespace:":"default","podIdentifier: ":"ledger-cron-15-28315510-default"}
{"level":"info","ts":"2023-11-02T13:10:35.224Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:149","msg":"Target Pod doesn't belong to the current pod Identifier: ","Name: ":"qmt-poland-motor-generali-worker-77d67fb98-zwp2w","Pod ID: ":"ledger-cron-15-28315510-default"}
{"level":"info","ts":"2023-11-02T13:10:35.224Z","logger":"controllers.policyEndpoints","caller":"controllers/policyendpoints_controller.go:149","msg":"Processing Pod: ","name:":"ledger-cron-15-28315490-9nbrf","namespace:":"default","podIdentifier: ":"ledger-cron-15-28315510-default"}
Environment:
- Kubernetes: 1.25
- Kubelet: v1.25.12-eks-8ccc7ba
- VPC-CNI add-on: v1.15.1-eksbuild.1
- AMI release: 1.25.12-20230825
- Kernel: 5.10.186-179.751.amzn2.x86_64
@AlexanderPavlovHC - Did you capture access logs to confirm if the traffic is getting denied? To capture access logs, you will have to enable this flag - enable-policy-events-logs. Since you mentioned only few pods are getting impacted can you share what is the pod scale in default NS? if the scale is high most probably you are running into this issue - #106. This is fixed with v1.0.5 network policy agent which is part of v1.15.3 release - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.15.3
Thank you for the guidance. After updating the addon version and enabling policy-events-logs, I see the following on the nodes where the CrashLoopBackOff error with the pod occurs:
Pod address 172.20.121.22
root@ip-172-20-100-147 /]# cat /var/log/aws-routed-eni/network-policy-agent.log | grep 172.20.121.22
{ "level": "info","ts": "2023-11-07T15:40:38.404Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:40:39.805Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:40:39.805Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:40:41.003Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:40:41.003Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:09.105Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:09.105Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:10.304Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:10.304Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:10.303Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "172.20.100.147","Src Port": 42190,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
{ "level": "info","ts": "2023-11-07T15:45:10.403Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
{ "level": "info","ts": "2023-11-07T15:45:10.403Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
{ "level": "info","ts": "2023-11-07T15:45:10.403Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
{ "level": "info","ts": "2023-11-07T15:45:10.403Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
{ "level": "info","ts": "2023-11-07T15:45:10.703Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
{ "level": "info","ts": "2023-11-07T15:45:11.603Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
{ "level": "info","ts": "2023-11-07T15:45:11.604Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:11.604Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:13.203Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "172.20.100.147","Src Port": 42206,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
{ "level": "info","ts": "2023-11-07T15:45:13.403Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
{ "level": "info","ts": "2023-11-07T15:45:13.604Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:13.604Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:14.405Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:14.405Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:15.603Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:15.603Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:16.003Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "172.20.100.147","Src Port": 42216,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
{ "level": "info","ts": "2023-11-07T15:45:16.503Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
{ "level": "info","ts": "2023-11-07T15:45:16.803Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:16.803Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:17.905Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:17.905Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:19.003Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "172.20.100.147","Src Port": 52162,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
{ "level": "info","ts": "2023-11-07T15:45:19.304Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:19.304Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:22.003Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "172.20.100.147","Src Port": 52170,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
{ "level": "info","ts": "2023-11-07T15:45:23.146Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "10.100.0.1","Src Port": 443,"Dest IP": "172.20.121.22","Dest Port": 55464,"Proto": "TCP","Verdict": "DENY" }
{ "level": "info","ts": "2023-11-07T15:45:24.981Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "172.20.100.147","Src Port": 52184,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
{ "level": "info","ts": "2023-11-07T15:45:27.981Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "172.20.100.147","Src Port": 46384,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
{ "level": "info","ts": "2023-11-07T15:45:29.406Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:29.406Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:30.705Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:30.705Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:30.981Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "172.20.100.147","Src Port": 46400,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
{ "level": "info","ts": "2023-11-07T15:45:32.005Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:32.005Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:33.106Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:33.106Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:33.403Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "172.20.121.22","Src Port": 55464,"Dest IP": "10.100.0.1","Dest Port": 443,"Proto": "TCP","Verdict": "ACCEPT" }
{ "level": "info","ts": "2023-11-07T15:45:34.003Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "172.20.100.147","Src Port": 46414,"Dest IP": "172.20.121.22","Dest Port": 5673,"Proto": "TCP","Verdict": "ACCEPT" }
{ "level": "info","ts": "2023-11-07T15:45:34.194Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "172.20.121.22","Src Port": 60082,"Dest IP": "10.100.0.1","Dest Port": 443,"Proto": "TCP","Verdict": "ACCEPT" }
{ "level": "info","ts": "2023-11-07T15:45:34.406Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:34.406Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:35.306Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:35.306Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:37.204Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:37.204Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:37.905Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:37.905Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:39.105Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "No L4 specified. Add Catch all entry: ","CIDR: ": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:45:39.105Z","logger": "ebpf-client","caller": "ebpf/bpf_client.go:659","msg": "Updating Map with ","IP Key:": "172.20.121.22/32" }
{ "level": "info","ts": "2023-11-07T15:46:16.501Z","logger": "ebpf-client","msg": "Flow Info: ","Src IP": "172.20.121.22","Src Port": 55992,"Dest IP": "10.100.0.1","Dest Port": 443,"Proto": "TCP","Verdict": "ACCEPT" }
@AlexanderPavlovHC So, you're running in to the same issue with 1.15.3? If yes, do the logs above show any DENY flows that you expect to be allowed? Since we don't know the pod IPs, will be tough for us to map them.
(or) can you collect node logs via /opt/cni/bin/aws-cni-support.sh
and mail them to k8s-awscni-triage@amazon.com
along with the describe o/p of policyEndpoint
resources and configured Network Policies.
v1.0.8 release is available - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.16.3. Please try it out and let us know if you see any issues..
Quick update to the above. If you come across this issue, like we did try upgrading to v1.16.4 instead of v1.16.3
. v1.16.3
has a bug that will eat 100% of the CPU.