aws / aws-network-policy-agent

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pod eBPF Map Differs from Policy Endpoint

aballman opened this issue · comments

What happened:

I'm using ArgoCD (not super relevant to the issue) with CNI enforced network policies.
The problem I'm experiencing is that after some time, the network policies seem to break, and one of the argo components can't talk to another one that is critical for argo to keep argo-ing.

Screenshot 2024-02-07 at 1 59 01 PM

Pods
NAME                                                READY   STATUS      RESTARTS        AGE     IP              NODE                            NOMINATED NODE   READINESS GATES
argocd-application-controller-0                     1/1     Running     16 (106m ago)   23h     10.146.53.40    ip-10-146-52-181.ec2.internal   <none>           <none>
argocd-applicationset-controller-7974ff9cf9-lvzsj   1/1     Running     0               20d     10.146.54.253   ip-10-146-52-181.ec2.internal   <none>           <none>
argocd-dex-server-5c6dfff575-mhvzq                  1/1     Running     0               20d     10.146.52.212   ip-10-146-52-181.ec2.internal   <none>           <none>
argocd-notifications-controller-778866f977-sv7vh    1/1     Running     0               23h     10.146.54.31    ip-10-146-52-181.ec2.internal   <none>           <none>
argocd-redis-5bcdf48d96-7f68c                       1/1     Running     0               23h     10.146.56.115   ip-10-146-59-179.ec2.internal   <none>           <none>
argocd-redis-ha-haproxy-7f84459cf-8mmkv             1/1     Running     0               33d     10.146.55.24    ip-10-146-52-181.ec2.internal   <none>           <none>
argocd-redis-ha-haproxy-7f84459cf-jd2fj             1/1     Running     0               23h     10.146.60.227   ip-10-146-60-223.ec2.internal   <none>           <none>
argocd-redis-ha-haproxy-7f84459cf-mbkdl             1/1     Running     0               9d      10.146.58.87    ip-10-146-56-47.ec2.internal    <none>           <none>
argocd-redis-ha-server-0                            3/3     Running     0               33d     10.146.53.217   ip-10-146-52-181.ec2.internal   <none>           <none>
argocd-redis-ha-server-1                            3/3     Running     0               23h     10.146.61.205   ip-10-146-61-114.ec2.internal   <none>           <none>
argocd-redis-ha-server-2                            3/3     Running     0               9d      10.146.58.69    ip-10-146-56-47.ec2.internal    <none>           <none>
argocd-repo-server-85ccb7dbdd-8txcw                 1/1     Running     0               23h     10.146.60.213   ip-10-146-60-223.ec2.internal   <none>           <none>
argocd-repo-server-85ccb7dbdd-cssn8                 1/1     Running     0               85m     10.146.54.47    ip-10-146-52-181.ec2.internal   <none>           <none>
argocd-server-6d6cd7bc6b-mb8vl                      1/1     Running     0               25h     10.146.53.126   ip-10-146-52-181.ec2.internal   <none>           <none>
argocd-server-6d6cd7bc6b-xq7tm                      1/1     Running     0               23h     10.146.60.14    ip-10-146-60-223.ec2.internal   <none>           <none>
Services
NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)              AGE
argocd-application-controller-metrics     ClusterIP   172.20.15.149    <none>        8082/TCP             161d
argocd-applicationset-controller          ClusterIP   172.20.49.132    <none>        7000/TCP             161d
argocd-dex-server                         ClusterIP   172.20.220.123   <none>        5556/TCP,5557/TCP    161d
argocd-notifications-controller-metrics   ClusterIP   172.20.176.139   <none>        9001/TCP             145d
argocd-redis                              ClusterIP   172.20.30.226    <none>        6379/TCP             161d
argocd-redis-ha                           ClusterIP   None             <none>        6379/TCP,26379/TCP   161d
argocd-redis-ha-announce-0                ClusterIP   172.20.101.17    <none>        6379/TCP,26379/TCP   161d
argocd-redis-ha-announce-1                ClusterIP   172.20.64.83     <none>        6379/TCP,26379/TCP   161d
argocd-redis-ha-announce-2                ClusterIP   172.20.114.228   <none>        6379/TCP,26379/TCP   161d
argocd-redis-ha-haproxy                   ClusterIP   172.20.187.207   <none>        6379/TCP,9101/TCP    161d
argocd-repo-server                        ClusterIP   172.20.33.160    <none>        8081/TCP             161d
argocd-server                             ClusterIP   172.20.203.179   <none>        80/TCP,443/TCP       161d
Endpoints
NAME                                      ENDPOINTS                                                                AGE
argocd-application-controller-metrics     10.146.53.40:8082                                                        161d
argocd-applicationset-controller          10.146.54.253:7000                                                       161d
argocd-dex-server                         10.146.52.212:5557,10.146.52.212:5556                                    161d
argocd-notifications-controller-metrics   10.146.54.31:9001                                                        145d
argocd-redis                              10.146.56.115:6379                                                       161d
argocd-redis-ha                           10.146.53.217:26379,10.146.58.69:26379,10.146.61.205:26379 + 3 more...   161d
argocd-redis-ha-announce-0                10.146.53.217:26379,10.146.53.217:6379                                   161d
argocd-redis-ha-announce-1                10.146.61.205:26379,10.146.61.205:6379                                   161d
argocd-redis-ha-announce-2                10.146.58.69:26379,10.146.58.69:6379                                     161d
argocd-redis-ha-haproxy                   10.146.55.24:6379,10.146.58.87:6379,10.146.60.227:6379 + 3 more...       161d
argocd-repo-server                        10.146.54.47:8081,10.146.60.213:8081                                     161d
argocd-server                             10.146.53.126:8080,10.146.60.14:8080,10.146.53.126:8080 + 1 more...      161d
Network Policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: argocd-repo-server
  namespace: argocd
spec:
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app.kubernetes.io/instance: argocd
          app.kubernetes.io/name: argocd-server
    - podSelector:
        matchLabels:
          app.kubernetes.io/instance: argocd
          app.kubernetes.io/name: argocd-application-controller
    - podSelector:
        matchLabels:
          app.kubernetes.io/instance: argocd
          app.kubernetes.io/name: argocd-notifications-controller
    - podSelector:
        matchLabels:
          app.kubernetes.io/instance: argocd
          app.kubernetes.io/name: argocd-applicationset-controller
    ports:
    - port: repo-server
      protocol: TCP
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: argocd
      app.kubernetes.io/name: argocd-repo-server
  policyTypes:
  - Ingress
Policy Endpoint
apiVersion: networking.k8s.aws/v1alpha1
kind: PolicyEndpoint
metadata:
  creationTimestamp: "2024-02-02T00:46:35Z"
  generateName: argocd-repo-server-
  generation: 141
  name: argocd-repo-server-sxvj2
  namespace: argocd
  ownerReferences:
  - apiVersion: networking.k8s.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: NetworkPolicy
    name: argocd-repo-server
    uid: a57fcdb4-d425-4aa4-b818-61c9168debbf
  resourceVersion: "149304150"
  uid: df6dadb8-e619-4f72-ba98-82618b9f8256
spec:
  ingress:
  - cidr: 10.146.54.253
    ports:
    - port: 8081
      protocol: TCP
  - cidr: 10.146.53.126
    ports:
    - port: 8081
      protocol: TCP
  - cidr: 10.146.54.31
    ports:
    - port: 8081
      protocol: TCP
  - cidr: 10.146.53.40
    ports:
    - port: 8081
      protocol: TCP
  - cidr: 10.146.60.14
    ports:
    - port: 8081
      protocol: TCP
  podIsolation:
  - Ingress
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: argocd
      app.kubernetes.io/name: argocd-repo-server
  podSelectorEndpoints:
  - hostIP: 10.146.60.223
    name: argocd-repo-server-85ccb7dbdd-8txcw
    namespace: argocd
    podIP: 10.146.60.213
  - hostIP: 10.146.52.181
    name: argocd-repo-server-85ccb7dbdd-cssn8
    namespace: argocd
    podIP: 10.146.54.47
  policyRef:
    name: argocd-repo-server
    namespace: argocd

The destination pods are

argocd-repo-server-85ccb7dbdd-8txcw                 1/1     Running     0               23h     10.146.60.213   ip-10-146-60-223.ec2.internal   <none>           <none>
argocd-repo-server-85ccb7dbdd-cssn8                 1/1     Running     0               85m     10.146.54.47    ip-10-146-52-181.ec2.internal   <none>           <none>

Using /aws-eks-na-cli ebpf loaded-ebpfdata I found the ebpf map corresponding to the pod on node ip-10-146-60-223.ec2.internal

bash-4.2# /aws-eks-na-cli ebpf loaded-ebpfdata | grep -A9 "repo-server"
PinPath:  /sys/fs/bpf/globals/aws/programs/argocd-repo-server-85ccb7dbdd-argocd_handle_ingress
Pod Identifier : argocd-repo-server-85ccb7dbdd-argocd  Direction : ingress
Prog ID:  211
Associated Maps ->
Map Name:  aws_conntrack_map
Map ID:  17
Map Name:  ingress_map
Map ID:  57
Map Name:  policy_events
Map ID:  18
========================================================================================
--
PinPath:  /sys/fs/bpf/globals/aws/programs/argocd-repo-server-85ccb7dbdd-argocd_handle_egress
Pod Identifier : argocd-repo-server-85ccb7dbdd-argocd  Direction : egress
Prog ID:  212
Associated Maps ->
Map Name:  aws_conntrack_map
Map ID:  17
Map Name:  egress_map
Map ID:  58
Map Name:  policy_events
Map ID:  18
========================================================================================
Here's the ebpf map dump from map `57` (good)
bash-4.2# /aws-eks-na-cli ebpf dump-maps 57
Key : IP/Prefixlen - 10.146.53.40/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.53.126/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.54.31/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.54.253/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.60.14/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.60.223/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Done reading all entries
Doing the same for the other node `ip-10-146-52-181.ec2.internal` (bad)
bash-4.2# /aws-eks-na-cli ebpf loaded-ebpfdata | grep -A9 "repo-server"
PinPath:  /sys/fs/bpf/globals/aws/programs/argocd-repo-server-85ccb7dbdd-argocd_handle_ingress
Pod Identifier : argocd-repo-server-85ccb7dbdd-argocd  Direction : ingress
Prog ID:  14411
Associated Maps ->
Map Name:  aws_conntrack_map
Map ID:  9
Map Name:  ingress_map
Map ID:  4214
Map Name:  policy_events
Map ID:  10
========================================================================================
--
PinPath:  /sys/fs/bpf/globals/aws/programs/argocd-repo-server-85ccb7dbdd-argocd_handle_egress
Pod Identifier : argocd-repo-server-85ccb7dbdd-argocd  Direction : egress
Prog ID:  14412
Associated Maps ->
Map Name:  policy_events
Map ID:  10
Map Name:  aws_conntrack_map
Map ID:  9
Map Name:  egress_map
Map ID:  4215
========================================================================================
bash-4.2# /aws-eks-na-cli ebpf dump-maps 4214
Key : IP/Prefixlen - 10.146.52.181/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Done reading all entries

One of the two pods seems to have an improperly built ebpf map relative to the policy endpoint. Here's a snippet of the most recent logs I could find referencing map 4214

{"level":"info","ts":"2024-02-07T22:00:08.024Z","logger":"ebpf-client","caller":"controllers/policyendpoints_controller.go:436","msg":"ID of map to update: ","ID: ":4214}
{"level":"info","ts":"2024-02-07T22:00:08.024Z","logger":"ebpf-client","caller":"controllers/policyendpoints_controller.go:278","msg":"Pod has an Egress hook attached. Update the corresponding map","progFD: ":45,"mapName: ":"egress_map"}
{"level":"info","ts":"2024-02-07T22:00:08.024Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:707","msg":"L4 values: ","protocol: ":254,"startPort: ":0,"endPort: ":0}
{"level":"info","ts":"2024-02-07T22:00:08.024Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:707","msg":"Current L4 entry count for catch all entry: ","count: ":0}
{"level":"info","ts":"2024-02-07T22:00:08.024Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:707","msg":"Total L4 entry count for catch all entry: ","count: ":0}
{"level":"info","ts":"2024-02-07T22:00:08.024Z","logger":"ebpf-client","caller":"ebpf/bpf_client.go:707","msg":"L4 values: ","protocol: ":254,"startPort: ":0,"endPort: ":0}

I am able to resolve this issue if I restart the aws-node pod on the problem node.
The timing on this is a bit odd. If I remove all the network policies and recreate, it takes several hours for this issue to manifest.
However, the problem pod here at the time of investigation was only ~90m old.


Attach logs
Log snippet attached, will provide more if requested

What you expected to happen:
Expected eBPF map to match rules from Policy Endpoint for all destination pods

How to reproduce it (as minimally and precisely as possible):

helm repo add argo https://argoproj.github.io/argo-helm && helm repo update
helm upgrade argocd -n argocd argo/argo-cd --version 5.53.11 \
    --set global.networkPolicy.create=true \
    --create-namespace --install

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): v1.28.5-eks-5e0fdde
  • CNI Version: v1.16.2
  • Network Policy Agent Version: v1.0.8-rc2
  • OS (e.g: cat /etc/os-release): Bottlerocket 1.9.0
  • Kernel (e.g. uname -a): 6.1.72

#183 Seems similar to my issue but I'm using the release candidate version that's referenced and reported as having fixed that particular issue.

@aballman - v1.0.8-rc3 is the latest. We hit a similar issue where the maps got wrongly updated. Can you please try v1.0.8-rc3?

@aballman - v1.0.8-rc3 is the latest. We hit a similar issue where the maps got wrongly updated. Can you please try v1.0.8-rc3?

Thanks! I'll give it a shot

Unfortunately this did not resolve my issue. The same problem is present. I've confirmed that i'm on v1.0.8-rc3 on the problem node. I also had rolled all nodes in my cluster ~16h ago when the Bottlerocket 1.19.1 fix was released.

Curiously, it seems to be a similar scenario, where the problem pod was on the node for ~90m.

NAME                                                READY   STATUS      RESTARTS       AGE     IP              NODE                            NOMINATED NODE   READINESS GATES
argocd-application-controller-0                     1/1     Running     36 (65m ago)   16h     10.146.63.42    ip-10-146-62-155.ec2.internal   <none>           <none>
argocd-applicationset-controller-7974ff9cf9-vjppv   1/1     Running     0              16h     10.146.63.228   ip-10-146-62-155.ec2.internal   <none>           <none>
argocd-dex-server-5c6dfff575-wrl7v                  1/1     Running     0              16h     10.146.53.41    ip-10-146-53-188.ec2.internal   <none>           <none>
argocd-notifications-controller-778866f977-9nhdd    1/1     Running     0              16h     10.146.60.229   ip-10-146-62-155.ec2.internal   <none>           <none>
argocd-redis-5bcdf48d96-x8bqp                       1/1     Running     0              16h     10.146.62.162   ip-10-146-62-155.ec2.internal   <none>           <none>
argocd-redis-ha-haproxy-7f84459cf-pmdfv             1/1     Running     0              16h     10.146.56.174   ip-10-146-57-151.ec2.internal   <none>           <none>
argocd-redis-ha-haproxy-7f84459cf-tcdsr             1/1     Running     0              19h     10.146.54.58    ip-10-146-55-43.ec2.internal    <none>           <none>
argocd-redis-ha-haproxy-7f84459cf-xs6dp             1/1     Running     0              16h     10.146.53.99    ip-10-146-53-188.ec2.internal   <none>           <none>
argocd-redis-ha-server-0                            3/3     Running     0              16h     10.146.58.63    ip-10-146-57-151.ec2.internal   <none>           <none>
argocd-redis-ha-server-1                            3/3     Running     0              16h     10.146.63.193   ip-10-146-62-155.ec2.internal   <none>           <none>
argocd-redis-ha-server-2                            3/3     Running     0              16h     10.146.52.251   ip-10-146-55-43.ec2.internal    <none>           <none>
argocd-repo-server-85ccb7dbdd-mkd4k                 1/1     Running     0              16h     10.146.54.42    ip-10-146-53-188.ec2.internal   <none>           <none>
argocd-repo-server-85ccb7dbdd-rvhlm                 1/1     Running     0              86m     10.146.62.175   ip-10-146-62-155.ec2.internal   <none>           <none>
argocd-server-6d6cd7bc6b-mccvn                      1/1     Running     0              16h     10.146.54.27    ip-10-146-53-188.ec2.internal   <none>           <none>
argocd-server-6d6cd7bc6b-pbbkd                      1/1     Running     0              19h     10.146.53.5     ip-10-146-55-43.ec2.internal    <none>           <none>
apiVersion: networking.k8s.aws/v1alpha1
kind: PolicyEndpoint
metadata:
  creationTimestamp: "2024-02-02T00:46:35Z"
  generateName: argocd-repo-server-
  generation: 243
  name: argocd-repo-server-sxvj2
  namespace: argocd
  ownerReferences:
  - apiVersion: networking.k8s.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: NetworkPolicy
    name: argocd-repo-server
    uid: a57fcdb4-d425-4aa4-b818-61c9168debbf
  resourceVersion: "150208318"
  uid: df6dadb8-e619-4f72-ba98-82618b9f8256
spec:
  ingress:
  - cidr: 10.146.53.5
    ports:
    - port: 8081
      protocol: TCP
  - cidr: 10.146.54.27
    ports:
    - port: 8081
      protocol: TCP
  - cidr: 10.146.60.229
    ports:
    - port: 8081
      protocol: TCP
  - cidr: 10.146.63.228
    ports:
    - port: 8081
      protocol: TCP
  - cidr: 10.146.63.42
    ports:
    - port: 8081
      protocol: TCP
  podIsolation:
  - Ingress
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: argocd
      app.kubernetes.io/name: argocd-repo-server
  podSelectorEndpoints:
  - hostIP: 10.146.53.188
    name: argocd-repo-server-85ccb7dbdd-mkd4k
    namespace: argocd
    podIP: 10.146.54.42
  - hostIP: 10.146.62.155
    name: argocd-repo-server-85ccb7dbdd-rvhlm
    namespace: argocd
    podIP: 10.146.62.175
  policyRef:
    name: argocd-repo-server
    namespace: argocd

ip-10-146-53-188.ec2.internal / aws-node-hntbh

bash-4.2# /aws-eks-na-cli ebpf loaded-ebpfdata | grep -A9 "repo-server"
PinPath:  /sys/fs/bpf/globals/aws/programs/argocd-repo-server-85ccb7dbdd-argocd_handle_egress
Pod Identifier : argocd-repo-server-85ccb7dbdd-argocd  Direction : egress
Prog ID:  108
Associated Maps ->
Map Name:  aws_conntrack_map
Map ID:  19
Map Name:  egress_map
Map ID:  29
Map Name:  policy_events
Map ID:  20
========================================================================================
PinPath:  /sys/fs/bpf/globals/aws/programs/argocd-repo-server-85ccb7dbdd-argocd_handle_ingress
Pod Identifier : argocd-repo-server-85ccb7dbdd-argocd  Direction : ingress
Prog ID:  107
Associated Maps ->
Map Name:  aws_conntrack_map
Map ID:  19
Map Name:  ingress_map
Map ID:  28
Map Name:  policy_events
Map ID:  20
========================================================================================
bash-4.2# /aws-eks-na-cli ebpf dump-maps 28
Key : IP/Prefixlen - 10.146.53.5/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.53.188/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.54.27/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.60.229/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.63.42/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.63.228/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Done reading all entries

ip-10-146-62-155.ec2.internal / aws-node-2756k

bash-4.2# /aws-eks-na-cli ebpf loaded-ebpfdata | grep -A9 "repo-server"
PinPath:  /sys/fs/bpf/globals/aws/programs/argocd-repo-server-85ccb7dbdd-argocd_handle_ingress
Pod Identifier : argocd-repo-server-85ccb7dbdd-argocd  Direction : ingress
Prog ID:  4022
Associated Maps ->
Map Name:  policy_events
Map ID:  31
Map Name:  aws_conntrack_map
Map ID:  30
Map Name:  ingress_map
Map ID:  1125
========================================================================================
--
PinPath:  /sys/fs/bpf/globals/aws/programs/argocd-repo-server-85ccb7dbdd-argocd_handle_egress
Pod Identifier : argocd-repo-server-85ccb7dbdd-argocd  Direction : egress
Prog ID:  4023
Associated Maps ->
Map Name:  aws_conntrack_map
Map ID:  30
Map Name:  egress_map
Map ID:  1126
Map Name:  policy_events
Map ID:  31
========================================================================================


========================================================================================
bash-4.2# /aws-eks-na-cli ebpf dump-maps 1125
Key : IP/Prefixlen - 10.146.62.155/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Done reading all entries
┃ ❯ k images aws-node-2756k -n kube-system
[Summary]: 1 namespaces, 1 pods, 3 containers and 3 different images
+----------------+-------------------------+-----------------------------------------------------------------------------------------+
|      Pod       |        Container        |                                          Image                                          |
+----------------+-------------------------+-----------------------------------------------------------------------------------------+
| aws-node-2756k | aws-node                | 602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni:v1.16.2                     |
+                +-------------------------+-----------------------------------------------------------------------------------------+
|                | aws-eks-nodeagent       | 602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon/aws-network-policy-agent:v1.0.8-rc3 |
+                +-------------------------+-----------------------------------------------------------------------------------------+
|                | (init) aws-vpc-cni-init | 602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni-init:v1.16.2                |
+----------------+-------------------------+-----------------------------------------------------------------------------------------+

@aballman - Are these existing pods or did you delete and re-create new pods?

  podSelectorEndpoints:
  - hostIP: 10.146.53.188
    name: argocd-repo-server-85ccb7dbdd-mkd4k
    namespace: argocd
    podIP: 10.146.54.42
  - hostIP: 10.146.62.155
    name: argocd-repo-server-85ccb7dbdd-rvhlm
    namespace: argocd
    podIP: 10.146.62.175

Can you also email us the network policy agent logs - /var/log/aws-routed-eni/network-policy-agent.log? You can mail them to k8s-awscni-triage@amazon.com

@aballman - Are these existing pods or did you delete and re-create new pods?

  podSelectorEndpoints:
  - hostIP: 10.146.53.188
    name: argocd-repo-server-85ccb7dbdd-mkd4k
    namespace: argocd
    podIP: 10.146.54.42
  - hostIP: 10.146.62.155
    name: argocd-repo-server-85ccb7dbdd-rvhlm
    namespace: argocd
    podIP: 10.146.62.175

Can you also email us the network policy agent logs - /var/log/aws-routed-eni/network-policy-agent.log? You can mail them to k8s-awscni-triage@amazon.com

They were pre-existing at the time of the fault. I'm not sure why that pod might be a little younger. The node itself is ~17h old. There is an HPA configured on it, so that could be the reason. I'll send the logs over when the issue comes up again in a few hours.

Sorry, I meant did you re-create the pods post upgrade to v1.0.8-rc3?

Sorry, I meant did you re-create the pods post upgrade to v1.0.8-rc3?

I think that I had made the update to the daemonset before karpenter rolled all my nodes for the bottlerocket update. I will restart all the pods now just to be explicit about it.

The symptoms are still occurring with the updated rc3 image. I noticed my alerts for this triggered over the weekend but it resolved before I had a chance to collect logs. I'll follow up again when I can do that

@aballman - We did try the steps for repro and issue isn't happening and pods are running since 3days. Do you have any pod or node churn in your cluster? Logs would be helpful.

There is a pretty significant churn of both pods and nodes in the cluster. It has github actions runners in the same cluster / node pool. It is scaling up and down during the day to run jobs and also has some consolidation that's happening thanks to karpenter.

I'll post logs as soon as I can gather them. Thanks for investigating!

Thanks @aballman. Are you on K8s slack channel? We can get on a call and understand your cluster config. If so can you please share your slack handle?

I'm not sure if I can say this is resolved because of the issue that I saw two weekends ago. I can say that I haven't had any more issues since that time period. So if it's not fixed, it's considerably improved.

I'm willing to work under the assumption that it is fixed with 1.0.8-rc3 and can open a new issue referencing this one if it returns.

Thanks @aballman. Please keep us updated. v1.0.8 release is available - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.16.3

@jayanthvn This is still an issue for me. It's a lot less frequent, but it still occurs. This most recent one looks like this:

┃ ❯ kgpo -owide | grep -E "(argocd-repo-server|argocd-server)"
argocd-repo-server-67974b6df-pnpls                  1/1     Running     0          127m    10.146.18.74    ip-10-146-17-182.ec2.internal   <none>           <none>
argocd-repo-server-67974b6df-s4d5c                  1/1     Running     0          102m    10.146.27.6     ip-10-146-27-54.ec2.internal    <none>           <none>
argocd-server-665597f9d8-7pff6                      1/1     Running     0          127m    10.146.16.12    ip-10-146-17-182.ec2.internal   <none>           <none>
argocd-server-665597f9d8-wgr84                      1/1     Running     0          116m    10.146.26.137   ip-10-146-27-54.ec2.internal    <none>           <none>

ip-10-146-17-182.ec2.internal - argocd-repo-server-67974b6df-pnpls

bash-4.2# /aws-eks-na-cli ebpf loaded-ebpfdata | grep -A9 "repo-server"
PinPath:  /sys/fs/bpf/globals/aws/programs/argocd-repo-server-67974b6df-argocd_handle_ingress
Pod Identifier : argocd-repo-server-67974b6df-argocd  Direction : ingress
Prog ID:  302
Associated Maps ->
Map Name:  aws_conntrack_map
Map ID:  33
Map Name:  ingress_map
Map ID:  88
Map Name:  policy_events
Map ID:  34
========================================================================================
--
PinPath:  /sys/fs/bpf/globals/aws/programs/argocd-repo-server-67974b6df-argocd_handle_egress
Pod Identifier : argocd-repo-server-67974b6df-argocd  Direction : egress
Prog ID:  303
Associated Maps ->
Map Name:  aws_conntrack_map
Map ID:  33
Map Name:  egress_map
Map ID:  89
Map Name:  policy_events
Map ID:  34
========================================================================================
Full Ingress Map
bash-4.2# /aws-eks-na-cli ebpf dump-maps 88
Key : IP/Prefixlen - 10.146.16.5/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.12/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.21/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.22/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.43/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.47/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.99/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.116/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.157/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.191/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.17.52/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.17.149/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.17.182/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.149/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.150/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.157/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.162/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.166/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.182/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.196/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.199/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.19.19/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.19.54/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.19.123/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.19.180/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.19.193/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.22.225/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.23.126/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.24.84/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.24.98/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.24.125/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.24.173/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.26.56/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.26.112/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.26.137/32
-------------------
Value Entry :  0
Protocol -  TCP
StartPort -  8081
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.26.146/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.26.226/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.27.54/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.27.209/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.29.209/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.30.250/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Done reading all entries
bash-4.2# /aws-eks-na-cli ebpf dump-maps 88 | grep "Key"
Key : IP/Prefixlen - 10.146.16.5/32
Key : IP/Prefixlen - 10.146.16.12/32
Key : IP/Prefixlen - 10.146.16.21/32
Key : IP/Prefixlen - 10.146.16.22/32
Key : IP/Prefixlen - 10.146.16.43/32
Key : IP/Prefixlen - 10.146.16.47/32
Key : IP/Prefixlen - 10.146.16.99/32
Key : IP/Prefixlen - 10.146.16.116/32
Key : IP/Prefixlen - 10.146.16.157/32
Key : IP/Prefixlen - 10.146.16.191/32
Key : IP/Prefixlen - 10.146.17.52/32
Key : IP/Prefixlen - 10.146.17.149/32
Key : IP/Prefixlen - 10.146.17.182/32
Key : IP/Prefixlen - 10.146.18.149/32
Key : IP/Prefixlen - 10.146.18.150/32
Key : IP/Prefixlen - 10.146.18.157/32
Key : IP/Prefixlen - 10.146.18.162/32
Key : IP/Prefixlen - 10.146.18.166/32
Key : IP/Prefixlen - 10.146.18.199/32
Key : IP/Prefixlen - 10.146.19.19/32
Key : IP/Prefixlen - 10.146.19.54/32
Key : IP/Prefixlen - 10.146.19.123/32
Key : IP/Prefixlen - 10.146.19.180/32
Key : IP/Prefixlen - 10.146.19.193/32
Key : IP/Prefixlen - 10.146.22.225/32
Key : IP/Prefixlen - 10.146.23.126/32
Key : IP/Prefixlen - 10.146.24.84/32
Key : IP/Prefixlen - 10.146.24.98/32
Key : IP/Prefixlen - 10.146.24.125/32
Key : IP/Prefixlen - 10.146.24.173/32
Key : IP/Prefixlen - 10.146.26.56/32
Key : IP/Prefixlen - 10.146.26.112/32
Key : IP/Prefixlen - 10.146.26.137/32
Key : IP/Prefixlen - 10.146.26.146/32
Key : IP/Prefixlen - 10.146.26.226/32
Key : IP/Prefixlen - 10.146.27.54/32
Key : IP/Prefixlen - 10.146.27.209/32
Key : IP/Prefixlen - 10.146.29.209/32
Key : IP/Prefixlen - 10.146.30.250/32

ip-10-146-27-54.ec2.internal - argocd-repo-server-67974b6df-s4d5c

bash-4.2# /aws-eks-na-cli ebpf loaded-ebpfdata | grep -A9 "repo-server"
PinPath:  /sys/fs/bpf/globals/aws/programs/argocd-repo-server-67974b6df-argocd_handle_egress
Pod Identifier : argocd-repo-server-67974b6df-argocd  Direction : egress
Prog ID:  399
Associated Maps ->
Map Name:  policy_events
Map ID:  20
Map Name:  aws_conntrack_map
Map ID:  19
Map Name:  egress_map
Map ID:  119
========================================================================================
--
PinPath:  /sys/fs/bpf/globals/aws/programs/argocd-repo-server-67974b6df-argocd_handle_ingress
Pod Identifier : argocd-repo-server-67974b6df-argocd  Direction : ingress
Prog ID:  398
Associated Maps ->
Map Name:  aws_conntrack_map
Map ID:  19
Map Name:  ingress_map
Map ID:  118
Map Name:  policy_events
Map ID:  20
========================================================================================
Full Ingress Map
bash-4.2# /aws-eks-na-cli ebpf dump-maps 118
Key : IP/Prefixlen - 10.146.16.5/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.21/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.22/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.43/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.47/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.116/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.157/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.16.191/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.17.149/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.17.182/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.149/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.150/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.157/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.162/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.166/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.18.199/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.19.19/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.19.54/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.19.123/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.19.193/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.22.225/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.23.126/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.24.84/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.24.98/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.24.125/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.24.173/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.26.56/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.26.112/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.26.146/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.26.226/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.27.54/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.27.209/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.29.209/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Key : IP/Prefixlen - 10.146.30.250/32
-------------------
Value Entry :  0
Protocol -  ANY PROTOCOL
StartPort -  0
Endport -  0
-------------------
*******************************
Done reading all entries
bash-4.2# /aws-eks-na-cli ebpf dump-maps 118 | grep "Key"
Key : IP/Prefixlen - 10.146.16.5/32
Key : IP/Prefixlen - 10.146.16.21/32
Key : IP/Prefixlen - 10.146.16.22/32
Key : IP/Prefixlen - 10.146.16.43/32
Key : IP/Prefixlen - 10.146.16.47/32
Key : IP/Prefixlen - 10.146.16.116/32
Key : IP/Prefixlen - 10.146.16.157/32
Key : IP/Prefixlen - 10.146.16.191/32
Key : IP/Prefixlen - 10.146.17.149/32
Key : IP/Prefixlen - 10.146.17.182/32
Key : IP/Prefixlen - 10.146.18.149/32
Key : IP/Prefixlen - 10.146.18.150/32
Key : IP/Prefixlen - 10.146.18.157/32
Key : IP/Prefixlen - 10.146.18.162/32
Key : IP/Prefixlen - 10.146.18.166/32
Key : IP/Prefixlen - 10.146.18.199/32
Key : IP/Prefixlen - 10.146.19.19/32
Key : IP/Prefixlen - 10.146.19.54/32
Key : IP/Prefixlen - 10.146.19.123/32
Key : IP/Prefixlen - 10.146.19.193/32
Key : IP/Prefixlen - 10.146.22.225/32
Key : IP/Prefixlen - 10.146.23.126/32
Key : IP/Prefixlen - 10.146.24.84/32
Key : IP/Prefixlen - 10.146.24.98/32
Key : IP/Prefixlen - 10.146.24.125/32
Key : IP/Prefixlen - 10.146.24.173/32
Key : IP/Prefixlen - 10.146.26.56/32
Key : IP/Prefixlen - 10.146.26.112/32
Key : IP/Prefixlen - 10.146.26.146/32
Key : IP/Prefixlen - 10.146.26.226/32
Key : IP/Prefixlen - 10.146.27.54/32
Key : IP/Prefixlen - 10.146.27.209/32
Key : IP/Prefixlen - 10.146.29.209/32
Key : IP/Prefixlen - 10.146.30.250/32
Network Policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: argocd-repo-server
  namespace: argocd
spec:
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app.kubernetes.io/instance: argocd
          app.kubernetes.io/name: argocd-server
    - podSelector:
        matchLabels:
          app.kubernetes.io/instance: argocd
          app.kubernetes.io/name: argocd-application-controller
    - podSelector:
        matchLabels:
          app.kubernetes.io/instance: argocd
          app.kubernetes.io/name: argocd-notifications-controller
    - podSelector:
        matchLabels:
          app.kubernetes.io/instance: argocd
          app.kubernetes.io/name: argocd-applicationset-controller
    ports:
    - port: 8081
      protocol: TCP
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: argocd
      app.kubernetes.io/name: argocd-repo-server
  policyTypes:
  - Ingress
Policy Endpoint
apiVersion: networking.k8s.aws/v1alpha1
kind: PolicyEndpoint
metadata:
  name: argocd-repo-server-8nwzb
  namespace: argocd
spec:
  ingress:
  - cidr: 10.146.17.52
    ports:
    - port: 8081
      protocol: TCP
    - port: 8081
      protocol: TCP
  - cidr: 10.146.19.180
    ports:
    - port: 8081
      protocol: TCP
    - port: 8081
      protocol: TCP
  - cidr: 10.146.16.99
    ports:
    - port: 8081
      protocol: TCP
    - port: 8081
      protocol: TCP
  - cidr: 10.146.26.137
    ports:
    - port: 8081
      protocol: TCP
    - port: 8081
      protocol: TCP
  - cidr: 10.146.16.12
    ports:
    - port: 8081
      protocol: TCP
    - port: 8081
      protocol: TCP
  podIsolation:
  - Ingress
  podSelector:
    matchLabels:
      app.kubernetes.io/instance: argocd
      app.kubernetes.io/name: argocd-repo-server
  podSelectorEndpoints:
  - hostIP: 10.146.17.182
    name: argocd-repo-server-67974b6df-pnpls
    namespace: argocd
    podIP: 10.146.18.74
  - hostIP: 10.146.27.54
    name: argocd-repo-server-67974b6df-s4d5c
    namespace: argocd
    podIP: 10.146.27.6
  policyRef:
    name: argocd-repo-server-supplemental
    namespace: argocd

argocd-repo-server-67974b6df-pnpls has the rules I expected given the network policy, which includes access from 10.146.16.12 and 10.146.26.137
argocd-repo-server-67974b6df-s4d5c has rules from other network policies, but does not include access from 10.146.16.12 or 10.146.26.137

Those pod IPs are in the PolicyEndpoint so it seems like the map is being built wrong

I've emailed my network-policy-agent.log file over to k8s-awscni-triage@amazon.com

@aballman - Thanks for checking. Wondering if some corner case here since none of the CIDRs in argocd-repo-server-8nwzb are in the ingress map... Do you have the logs for argocd-repo-server-67974b6df-s4d5c?

Thanks, got the logs. Will get back.

@jayanthvn Any updates on this? I believe we are still hitting same issue, even after upgrading VPC CNI to v1.16.4-eksbuild.2 (so network policy agent is at v1.0.8-eksbuild.1). Must say that frequency of the issue has dropped, but is not fully resolved.

@DomantasVar - We have identified a fix for this..right now testing the image. /cc @achevuru

@jayanthvn are there any updates on the progress regarding this issue? Since this is blocking important production migration for us, we are interested whether it's feasible to wait for issue resolution, or alternative migration path needs to be found.

Any update on this @jayanthvn :) ?

Sorry for the delay, we ran into few corner cases and had to rework few things. We will be running our regression suite and if things look green we will have the RC image probably by next week. Thanks for waiting!

Hello @jayanthvn, any new updates on the progress towards resolving this?

The issue is resolved with network policy agent version - 1.1.2 - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.18.2