containers / conmon-rs

An OCI container runtime monitor written in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kubernetes e2e test `Kubectl Port forwarding With a server listening on localhost that expects a client request should support a client that connects, sends DATA, and disconnects` fails

saschagrunert opened this issue · comments

What happened?

The test fails in cri-o/cri-o#6220 together with other cases, but I assume that the root cause is the same.

Interestingly, on my local machine the cancellation of the execsync token seems to kill the main container PID.

What did you expect to happen?

That the test succeeds.

How can we reproduce it (as minimally and precisely as possible)?

Running the test

> k8s-test-run --ginkgo.focus "Kubectl Port forwarding With a server listening on localhost that expects a client request should support a client that connects, sends DATA, and disconnects"
…
[BeforeEach] [sig-cli] Kubectl Port forwarding
  set up framework | framework.go:158
STEP: Creating a kubernetes client 09/16/22 11:08:56.88
Sep 16 11:08:56.880: INFO: >>> kubeConfig: /var/run/kubernetes/admin.kubeconfig
STEP: Building a namespace api object, basename port-forwarding 09/16/22 11:08:56.881
STEP: Waiting for a default service account to be provisioned in namespace 09/16/22 11:08:56.885
STEP: Waiting for kube-root-ca.crt to be provisioned in namespace 09/16/22 11:08:56.887
[It] should support a client that connects, sends DATA, and disconnects
  test/e2e/kubectl/portforward.go:481
STEP: Creating the target pod 09/16/22 11:08:56.888
Sep 16 11:08:56.892: INFO: Waiting up to 5m0s for pod "pfpod" in namespace "port-forwarding-8233" to be "running and ready"
Sep 16 11:08:56.893: INFO: Pod "pfpod": Phase="Pending", Reason="", readiness=false. Elapsed: 1.353931ms
Sep 16 11:08:56.893: INFO: The phase of Pod pfpod is Pending, waiting for it to be Running (with Ready = true)
Sep 16 11:08:58.896: INFO: Pod "pfpod": Phase="Running", Reason="", readiness=false. Elapsed: 2.003699716s
Sep 16 11:08:58.896: INFO: The phase of Pod pfpod is Running (Ready = false)
Sep 16 11:09:00.896: INFO: Pod "pfpod": Phase="Running", Reason="", readiness=false. Elapsed: 4.003966771s
Sep 16 11:09:00.896: INFO: The phase of Pod pfpod is Running (Ready = false)
Sep 16 11:09:02.895: INFO: Pod "pfpod": Phase="Running", Reason="", readiness=true. Elapsed: 6.003036101s
Sep 16 11:09:02.895: INFO: The phase of Pod pfpod is Running (Ready = true)
Sep 16 11:09:02.895: INFO: Pod "pfpod" satisfied condition "running and ready"
STEP: Running 'kubectl port-forward' 09/16/22 11:09:02.895
Sep 16 11:09:02.895: INFO: starting port-forward command and streaming output
Sep 16 11:09:02.895: INFO: Asynchronously running '/home/sascha/go/src/k8s.io/kubernetes/_output/local/bin/linux/amd64/kubectl kubectl --server=https://localhost:6443/ --kubeconfig=/var/run/kubernetes/admin.kubeconfig --namespace=port-forwarding-8233 port-forward --namespace=port-forwarding-8233 pfpod :80'
Sep 16 11:09:02.895: INFO: reading from `kubectl port-forward` command's stdout
STEP: Dialing the local port 09/16/22 11:09:02.951
STEP: Sending the expected data to the local port 09/16/22 11:09:02.952
STEP: Reading data from the local port 09/16/22 11:09:02.952
STEP: Closing the write half of the client's connection 09/16/22 11:09:04.859
STEP: Waiting for the target pod to stop running 09/16/22 11:09:04.859
Sep 16 11:09:04.859: INFO: Waiting up to 5m0s for pod "pfpod" in namespace "port-forwarding-8233" to be "container terminated"
Sep 16 11:09:04.860: INFO: Pod "pfpod": Phase="Running", Reason="", readiness=false. Elapsed: 1.185975ms
Sep 16 11:09:04.860: INFO: Pod "pfpod" satisfied condition "container terminated"
STEP: Verifying logs 09/16/22 11:09:04.86
…
[ hangs for 120s ]

We can see that the port-forwarder is dead within the container:

> kubectl get pods -A
NAMESPACE              NAME                      READY   STATUS    RESTARTS   AGE
kube-system            coredns-567b6dd84-bt6pf   1/1     Running   0          2m50s
port-forwarding-5552   pfpod                     0/2     Error     0          12s
> ps aux | rg agnhost
root     1000821  0.0  0.0 740048 24788 ?        Ssl  11:10   0:00 /agnhost netexec

And the conmonrs logs also indicate that it got terminated:

> sudo journalctl -f _COMM=conmonrs --since=now
conmonrs[1028242]: Using systemd/journald logger
conmonrs[1028242]: Set log level to: trace
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: registering event source with poller: token=Token(1), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(2), interests=READABLE | WRITABLE
conmonrs[1028243]: Got a version request
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: registering event source with poller: token=Token(16777218), interests=READABLE | WRITABLE
conmonrs[1028243]: Got a version request
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: registering event source with poller: token=Token(33554434), interests=READABLE | WRITABLE
conmonrs[1028243]: Got a create container request
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: PID file is /run/containers/storage/overlay-containers/1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7/userdata/pidfile
conmonrs[1028243]: Runtime args "--root=/run/runc create --bundle /run/containers/storage/overlay-containers/1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7/userdata --pid-file /run/containers/storage/overlay-containers/1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7/userdata/pidfile 1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7"
conmonrs[1028243]: Initializing CRI logger in path /var/log/pods/port-forwarding-2789_pfpod_39410d99-a33c-4a12-bd3b-0a47e1fb0d9c/readiness/0.log
conmonrs[1028243]: registering event source with poller: token=Token(3), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(4), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(5), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Using cgroup path: /proc/1028268/cgroup
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: Read 150 bytes
conmonrs[1028243]: Wrote log line of length 120
conmonrs[1028243]: Wrote log line of length 120
conmonrs[1028243]: registering event source with poller: token=Token(50331650), interests=READABLE | WRITABLE
conmonrs[1028243]: Got a create container request
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: PID file is /run/containers/storage/overlay-containers/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d/userdata/pidfile
conmonrs[1028243]: Runtime args "--root=/run/runc create --bundle /run/containers/storage/overlay-containers/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d/userdata --pid-file /run/containers/storage/overlay-containers/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d/userdata/pidfile a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d"
conmonrs[1028243]: Initializing CRI logger in path /var/log/pods/port-forwarding-2789_pfpod_39410d99-a33c-4a12-bd3b-0a47e1fb0d9c/portforwardtester/0.log
conmonrs[1028243]: registering event source with poller: token=Token(6), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(7), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(8), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Using cgroup path: /proc/1028312/cgroup
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: registering event source with poller: token=Token(67108866), interests=READABLE | WRITABLE
conmonrs[1028243]: Got exec sync container request with timeout 60
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: Exec args "--root=/run/runc exec -d --pid-file=/run/containers/storage/overlay-containers/f5a4cb30d7219327b1fb12d111a14137ba65b5ebbfaaa50e4ba980070c4ad7bf/userdata/exec_syncAZSUifMpid 1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7 sh -c netstat -na | grep LISTEN | grep -v 8080 | grep 80"
conmonrs[1028243]: registering event source with poller: token=Token(9), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(10), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(11), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: Using cgroup path: /proc/1028958/cgroup
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: Read 81 bytes
conmonrs[1028243]: Exited 0
conmonrs[1028243]: TOKEN: CancellationToken { is_cancelled: false }, PID: 1028958
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Loop cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Exiting because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Done watching for ooms
conmonrs[1028243]: Write to exit paths: 
conmonrs[1028243]: Sending exit struct to channel: ExitChannelData { exit_code: 0, oomed: false, timed_out: false }
conmonrs[1028243]: Task done
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: registering event source with poller: token=Token(83886082), interests=READABLE | WRITABLE
conmonrs[1028243]: Got exec sync container request with timeout 60
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: Exec args "--root=/run/runc exec -d --pid-file=/run/containers/storage/overlay-containers/f5a4cb30d7219327b1fb12d111a14137ba65b5ebbfaaa50e4ba980070c4ad7bf/userdata/exec_synczA8mDmspid 1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7 sh -c netstat -na | grep LISTEN | grep -v 8080 | grep 80"
conmonrs[1028243]: registering event source with poller: token=Token(16777225), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(16777227), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(16777226), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: Using cgroup path: /proc/1029067/cgroup
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: Read 81 bytes
conmonrs[1028243]: Exited 0
conmonrs[1028243]: TOKEN: CancellationToken { is_cancelled: false }, PID: 1029067
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: Loop cancelled
conmonrs[1028243]: Exiting because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Done watching for ooms
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Write to exit paths: 
conmonrs[1028243]: Sending exit struct to channel: ExitChannelData { exit_code: 0, oomed: false, timed_out: false }
conmonrs[1028243]: Task done
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Read 57 bytes
conmonrs[1028243]: Wrote log line of length 72
conmonrs[1028243]: Wrote log line of length 75
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Stdout read loop failure: send data message: channel closed
conmonrs[1028243]: registering event source with poller: token=Token(16777223), interests=READABLE | WRITABLE
conmonrs[1028243]: Got exec sync container request with timeout 60
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: Exec args "--root=/run/runc exec -d --pid-file=/run/containers/storage/overlay-containers/f5a4cb30d7219327b1fb12d111a14137ba65b5ebbfaaa50e4ba980070c4ad7bf/userdata/exec_sync34TJjjHpid 1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7 sh -c netstat -na | grep LISTEN | grep -v 8080 | grep 80"
conmonrs[1028243]: registering event source with poller: token=Token(100663298), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(33554442), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(33554441), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: Using cgroup path: /proc/1029207/cgroup
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: Exited 0
conmonrs[1028243]: Read 81 bytes
conmonrs[1028243]: TOKEN: CancellationToken { is_cancelled: false }, PID: 1029207
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Loop cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Exiting because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Done watching for ooms
conmonrs[1028243]: Write to exit paths: 
conmonrs[1028243]: Sending exit struct to channel: ExitChannelData { exit_code: 0, oomed: false, timed_out: false }
conmonrs[1028243]: Task done
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Exited 2
conmonrs[1028243]: TOKEN: CancellationToken { is_cancelled: false }, PID: 1028312
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: Loop cancelled
conmonrs[1028243]: Exiting because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Stderr read loop failure: send done message: channel closed
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Done watching for ooms
conmonrs[1028243]: Write to exit paths: /var/run/crio/exits/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d, /var/lib/containers/storage/overlay-containers/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d/userdata/exit
conmonrs[1028243]: Creating exit file
conmonrs[1028243]: Creating exit file
conmonrs[1028243]: Writing exit code to file
conmonrs[1028243]: Writing exit code to file
conmonrs[1028243]: Flushing file
conmonrs[1028243]: Flushing file
conmonrs[1028243]: Done writing exit file
conmonrs[1028243]: Done writing exit file
conmonrs[1028243]: Sending exit struct to channel: ExitChannelData { exit_code: 2, oomed: false, timed_out: false }

Anything else we need to know?

Interestingly, when running the workload from the yaml then it works:

apiVersion: v1
kind: Pod
metadata:
  labels:
    name: pfpod
  name: pfpod
spec:
  containers:
  - args:
    - netexec
    image: registry.k8s.io/e2e-test-images/agnhost:2.40
    imagePullPolicy: IfNotPresent
    name: readiness
    readinessProbe:
      exec:
        command:
        - sh
        - -c
        - netstat -na | grep LISTEN | grep -v 8080 | grep 80
      failureThreshold: 3
      initialDelaySeconds: 5
      periodSeconds: 1
      successThreshold: 1
      timeoutSeconds: 60
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-q2nfb
      readOnly: true
  - args:
    - port-forward-tester
    env:
    - name: BIND_PORT
      value: "80"
    - name: EXPECTED_CLIENT_DATA
      value: abc
    - name: CHUNKS
      value: "10"
    - name: CHUNK_SIZE
      value: "10"
    - name: CHUNK_INTERVAL
      value: "100"
    - name: BIND_ADDRESS
      value: localhost
    image: registry.k8s.io/e2e-test-images/agnhost:2.40
    imagePullPolicy: IfNotPresent
    name: portforwardtester
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: kube-api-access-q2nfb
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  nodeName: 127.0.0.1
  preemptionPolicy: PreemptLowerPriority
  priority: 0
  restartPolicy: Never
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: kube-api-access-q2nfb
    projected:
      defaultMode: 420
      sources:
      - serviceAccountToken:
          expirationSeconds: 3607
          path: token
      - configMap:
          items:
          - key: ca.crt
            path: ca.crt
          name: kube-root-ca.crt
      - downwardAPI:
          items:
          - fieldRef:
              apiVersion: v1
              fieldPath: metadata.namespace
            path: namespace
> k get pods
NAME    READY   STATUS    RESTARTS   AGE
pfpod   2/2     Running   0          24s

Removing the readiness probe in the test makes it pass on my local machine:
https://github.com/kubernetes/kubernetes/blob/02ac8ac4181e179c2f030a9f8b1abef0d9a0b512/test/e2e/kubectl/portforward.go#L77-L87

So it looks like that the the first execsync request kills the portforwardtester container.

conmon-rs version

$ conmonrs --version
version: 0.2.0
tag: none
commit: 51eb41deca7402edb0eb8b75b965283790dbf299
build: 2022-09-15 07:32:52 +00:00
target: x86_64-unknown-linux-gnu
rustc 1.63.0 (4b91a6ea7 2022-08-08)
cargo 1.63.0 (fd9c4297c 2022-07-01)

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

Additional environment details (AWS, VirtualBox, physical, etc.)

Ultra strange that it works when removing the readiness probe. It seems that the portforwardtester executable gets exited with code 2, whereas I'm wondering what is causing that.

Edit: It's this line which stop the process: https://github.com/kubernetes/kubernetes/blob/f0823c0f59d6ea8e2ad0dc4e3fe0f2b396c9185f/test/e2e/kubectl/portforward.go#L324