Kubernetes e2e test `Kubectl Port forwarding With a server listening on localhost that expects a client request should support a client that connects, sends DATA, and disconnects` fails
saschagrunert opened this issue · comments
What happened?
The test fails in cri-o/cri-o#6220 together with other cases, but I assume that the root cause is the same.
Interestingly, on my local machine the cancellation of the execsync token seems to kill the main container PID.
What did you expect to happen?
That the test succeeds.
How can we reproduce it (as minimally and precisely as possible)?
Running the test
> k8s-test-run --ginkgo.focus "Kubectl Port forwarding With a server listening on localhost that expects a client request should support a client that connects, sends DATA, and disconnects"
…
[BeforeEach] [sig-cli] Kubectl Port forwarding
set up framework | framework.go:158
STEP: Creating a kubernetes client 09/16/22 11:08:56.88
Sep 16 11:08:56.880: INFO: >>> kubeConfig: /var/run/kubernetes/admin.kubeconfig
STEP: Building a namespace api object, basename port-forwarding 09/16/22 11:08:56.881
STEP: Waiting for a default service account to be provisioned in namespace 09/16/22 11:08:56.885
STEP: Waiting for kube-root-ca.crt to be provisioned in namespace 09/16/22 11:08:56.887
[It] should support a client that connects, sends DATA, and disconnects
test/e2e/kubectl/portforward.go:481
STEP: Creating the target pod 09/16/22 11:08:56.888
Sep 16 11:08:56.892: INFO: Waiting up to 5m0s for pod "pfpod" in namespace "port-forwarding-8233" to be "running and ready"
Sep 16 11:08:56.893: INFO: Pod "pfpod": Phase="Pending", Reason="", readiness=false. Elapsed: 1.353931ms
Sep 16 11:08:56.893: INFO: The phase of Pod pfpod is Pending, waiting for it to be Running (with Ready = true)
Sep 16 11:08:58.896: INFO: Pod "pfpod": Phase="Running", Reason="", readiness=false. Elapsed: 2.003699716s
Sep 16 11:08:58.896: INFO: The phase of Pod pfpod is Running (Ready = false)
Sep 16 11:09:00.896: INFO: Pod "pfpod": Phase="Running", Reason="", readiness=false. Elapsed: 4.003966771s
Sep 16 11:09:00.896: INFO: The phase of Pod pfpod is Running (Ready = false)
Sep 16 11:09:02.895: INFO: Pod "pfpod": Phase="Running", Reason="", readiness=true. Elapsed: 6.003036101s
Sep 16 11:09:02.895: INFO: The phase of Pod pfpod is Running (Ready = true)
Sep 16 11:09:02.895: INFO: Pod "pfpod" satisfied condition "running and ready"
STEP: Running 'kubectl port-forward' 09/16/22 11:09:02.895
Sep 16 11:09:02.895: INFO: starting port-forward command and streaming output
Sep 16 11:09:02.895: INFO: Asynchronously running '/home/sascha/go/src/k8s.io/kubernetes/_output/local/bin/linux/amd64/kubectl kubectl --server=https://localhost:6443/ --kubeconfig=/var/run/kubernetes/admin.kubeconfig --namespace=port-forwarding-8233 port-forward --namespace=port-forwarding-8233 pfpod :80'
Sep 16 11:09:02.895: INFO: reading from `kubectl port-forward` command's stdout
STEP: Dialing the local port 09/16/22 11:09:02.951
STEP: Sending the expected data to the local port 09/16/22 11:09:02.952
STEP: Reading data from the local port 09/16/22 11:09:02.952
STEP: Closing the write half of the client's connection 09/16/22 11:09:04.859
STEP: Waiting for the target pod to stop running 09/16/22 11:09:04.859
Sep 16 11:09:04.859: INFO: Waiting up to 5m0s for pod "pfpod" in namespace "port-forwarding-8233" to be "container terminated"
Sep 16 11:09:04.860: INFO: Pod "pfpod": Phase="Running", Reason="", readiness=false. Elapsed: 1.185975ms
Sep 16 11:09:04.860: INFO: Pod "pfpod" satisfied condition "container terminated"
STEP: Verifying logs 09/16/22 11:09:04.86
…
[ hangs for 120s ]
We can see that the port-forwarder is dead within the container:
> kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system coredns-567b6dd84-bt6pf 1/1 Running 0 2m50s
port-forwarding-5552 pfpod 0/2 Error 0 12s
> ps aux | rg agnhost
root 1000821 0.0 0.0 740048 24788 ? Ssl 11:10 0:00 /agnhost netexec
And the conmonrs logs also indicate that it got terminated:
> sudo journalctl -f _COMM=conmonrs --since=now
conmonrs[1028242]: Using systemd/journald logger
conmonrs[1028242]: Set log level to: trace
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: registering event source with poller: token=Token(1), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(2), interests=READABLE | WRITABLE
conmonrs[1028243]: Got a version request
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: registering event source with poller: token=Token(16777218), interests=READABLE | WRITABLE
conmonrs[1028243]: Got a version request
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: registering event source with poller: token=Token(33554434), interests=READABLE | WRITABLE
conmonrs[1028243]: Got a create container request
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: PID file is /run/containers/storage/overlay-containers/1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7/userdata/pidfile
conmonrs[1028243]: Runtime args "--root=/run/runc create --bundle /run/containers/storage/overlay-containers/1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7/userdata --pid-file /run/containers/storage/overlay-containers/1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7/userdata/pidfile 1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7"
conmonrs[1028243]: Initializing CRI logger in path /var/log/pods/port-forwarding-2789_pfpod_39410d99-a33c-4a12-bd3b-0a47e1fb0d9c/readiness/0.log
conmonrs[1028243]: registering event source with poller: token=Token(3), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(4), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(5), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Using cgroup path: /proc/1028268/cgroup
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: Read 150 bytes
conmonrs[1028243]: Wrote log line of length 120
conmonrs[1028243]: Wrote log line of length 120
conmonrs[1028243]: registering event source with poller: token=Token(50331650), interests=READABLE | WRITABLE
conmonrs[1028243]: Got a create container request
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: PID file is /run/containers/storage/overlay-containers/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d/userdata/pidfile
conmonrs[1028243]: Runtime args "--root=/run/runc create --bundle /run/containers/storage/overlay-containers/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d/userdata --pid-file /run/containers/storage/overlay-containers/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d/userdata/pidfile a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d"
conmonrs[1028243]: Initializing CRI logger in path /var/log/pods/port-forwarding-2789_pfpod_39410d99-a33c-4a12-bd3b-0a47e1fb0d9c/portforwardtester/0.log
conmonrs[1028243]: registering event source with poller: token=Token(6), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(7), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(8), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Using cgroup path: /proc/1028312/cgroup
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: registering event source with poller: token=Token(67108866), interests=READABLE | WRITABLE
conmonrs[1028243]: Got exec sync container request with timeout 60
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: Exec args "--root=/run/runc exec -d --pid-file=/run/containers/storage/overlay-containers/f5a4cb30d7219327b1fb12d111a14137ba65b5ebbfaaa50e4ba980070c4ad7bf/userdata/exec_syncAZSUifMpid 1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7 sh -c netstat -na | grep LISTEN | grep -v 8080 | grep 80"
conmonrs[1028243]: registering event source with poller: token=Token(9), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(10), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(11), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: Using cgroup path: /proc/1028958/cgroup
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: Read 81 bytes
conmonrs[1028243]: Exited 0
conmonrs[1028243]: TOKEN: CancellationToken { is_cancelled: false }, PID: 1028958
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Loop cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Exiting because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Done watching for ooms
conmonrs[1028243]: Write to exit paths:
conmonrs[1028243]: Sending exit struct to channel: ExitChannelData { exit_code: 0, oomed: false, timed_out: false }
conmonrs[1028243]: Task done
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: registering event source with poller: token=Token(83886082), interests=READABLE | WRITABLE
conmonrs[1028243]: Got exec sync container request with timeout 60
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: Exec args "--root=/run/runc exec -d --pid-file=/run/containers/storage/overlay-containers/f5a4cb30d7219327b1fb12d111a14137ba65b5ebbfaaa50e4ba980070c4ad7bf/userdata/exec_synczA8mDmspid 1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7 sh -c netstat -na | grep LISTEN | grep -v 8080 | grep 80"
conmonrs[1028243]: registering event source with poller: token=Token(16777225), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(16777227), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(16777226), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: Using cgroup path: /proc/1029067/cgroup
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: Read 81 bytes
conmonrs[1028243]: Exited 0
conmonrs[1028243]: TOKEN: CancellationToken { is_cancelled: false }, PID: 1029067
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: Loop cancelled
conmonrs[1028243]: Exiting because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Done watching for ooms
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Write to exit paths:
conmonrs[1028243]: Sending exit struct to channel: ExitChannelData { exit_code: 0, oomed: false, timed_out: false }
conmonrs[1028243]: Task done
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Read 57 bytes
conmonrs[1028243]: Wrote log line of length 72
conmonrs[1028243]: Wrote log line of length 75
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Stdout read loop failure: send data message: channel closed
conmonrs[1028243]: registering event source with poller: token=Token(16777223), interests=READABLE | WRITABLE
conmonrs[1028243]: Got exec sync container request with timeout 60
conmonrs[1028243]: Creating new IO streams
conmonrs[1028243]: Exec args "--root=/run/runc exec -d --pid-file=/run/containers/storage/overlay-containers/f5a4cb30d7219327b1fb12d111a14137ba65b5ebbfaaa50e4ba980070c4ad7bf/userdata/exec_sync34TJjjHpid 1d9bbf813ae2ed8bd89e64f2160f973e7bfecd1311876c4c8d0f5af1eeb471b7 sh -c netstat -na | grep LISTEN | grep -v 8080 | grep 80"
conmonrs[1028243]: registering event source with poller: token=Token(100663298), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(33554442), interests=READABLE | WRITABLE
conmonrs[1028243]: registering event source with poller: token=Token(33554441), interests=READABLE | WRITABLE
conmonrs[1028243]: Start reading from IO streams
conmonrs[1028243]: Running task
conmonrs[1028243]: Waiting for exit code
conmonrs[1028243]: Using cgroup path: /proc/1029207/cgroup
conmonrs[1028243]: Setup cgroup v2 handling
conmonrs[1028243]: registering event source with poller: token=Token(0), interests=READABLE
conmonrs[1028243]: Exited 0
conmonrs[1028243]: Read 81 bytes
conmonrs[1028243]: TOKEN: CancellationToken { is_cancelled: false }, PID: 1029207
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Loop cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Exiting because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Done watching for ooms
conmonrs[1028243]: Write to exit paths:
conmonrs[1028243]: Sending exit struct to channel: ExitChannelData { exit_code: 0, oomed: false, timed_out: false }
conmonrs[1028243]: Task done
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Exited 2
conmonrs[1028243]: TOKEN: CancellationToken { is_cancelled: false }, PID: 1028312
conmonrs[1028243]: Sending done because token cancelled
conmonrs[1028243]: Loop cancelled
conmonrs[1028243]: Exiting because token cancelled
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Stderr read loop failure: send done message: channel closed
conmonrs[1028243]: deregistering event source from poller
conmonrs[1028243]: Done watching for ooms
conmonrs[1028243]: Write to exit paths: /var/run/crio/exits/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d, /var/lib/containers/storage/overlay-containers/a4100c860bee5c960903153702b486bd4ada8c6c6a77374b8edfc1afd55b872d/userdata/exit
conmonrs[1028243]: Creating exit file
conmonrs[1028243]: Creating exit file
conmonrs[1028243]: Writing exit code to file
conmonrs[1028243]: Writing exit code to file
conmonrs[1028243]: Flushing file
conmonrs[1028243]: Flushing file
conmonrs[1028243]: Done writing exit file
conmonrs[1028243]: Done writing exit file
conmonrs[1028243]: Sending exit struct to channel: ExitChannelData { exit_code: 2, oomed: false, timed_out: false }
Anything else we need to know?
Interestingly, when running the workload from the yaml then it works:
apiVersion: v1
kind: Pod
metadata:
labels:
name: pfpod
name: pfpod
spec:
containers:
- args:
- netexec
image: registry.k8s.io/e2e-test-images/agnhost:2.40
imagePullPolicy: IfNotPresent
name: readiness
readinessProbe:
exec:
command:
- sh
- -c
- netstat -na | grep LISTEN | grep -v 8080 | grep 80
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 60
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-q2nfb
readOnly: true
- args:
- port-forward-tester
env:
- name: BIND_PORT
value: "80"
- name: EXPECTED_CLIENT_DATA
value: abc
- name: CHUNKS
value: "10"
- name: CHUNK_SIZE
value: "10"
- name: CHUNK_INTERVAL
value: "100"
- name: BIND_ADDRESS
value: localhost
image: registry.k8s.io/e2e-test-images/agnhost:2.40
imagePullPolicy: IfNotPresent
name: portforwardtester
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
name: kube-api-access-q2nfb
readOnly: true
dnsPolicy: ClusterFirst
enableServiceLinks: true
nodeName: 127.0.0.1
preemptionPolicy: PreemptLowerPriority
priority: 0
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
serviceAccount: default
serviceAccountName: default
terminationGracePeriodSeconds: 30
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 300
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 300
volumes:
- name: kube-api-access-q2nfb
projected:
defaultMode: 420
sources:
- serviceAccountToken:
expirationSeconds: 3607
path: token
- configMap:
items:
- key: ca.crt
path: ca.crt
name: kube-root-ca.crt
- downwardAPI:
items:
- fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
path: namespace
> k get pods
NAME READY STATUS RESTARTS AGE
pfpod 2/2 Running 0 24s
Removing the readiness probe in the test makes it pass on my local machine:
https://github.com/kubernetes/kubernetes/blob/02ac8ac4181e179c2f030a9f8b1abef0d9a0b512/test/e2e/kubectl/portforward.go#L77-L87
So it looks like that the the first execsync request kills the portforwardtester
container.
conmon-rs version
$ conmonrs --version
version: 0.2.0
tag: none
commit: 51eb41deca7402edb0eb8b75b965283790dbf299
build: 2022-09-15 07:32:52 +00:00
target: x86_64-unknown-linux-gnu
rustc 1.63.0 (4b91a6ea7 2022-08-08)
cargo 1.63.0 (fd9c4297c 2022-07-01)
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
Additional environment details (AWS, VirtualBox, physical, etc.)
Ultra strange that it works when removing the readiness probe. It seems that the portforwardtester
executable gets exited with code 2, whereas I'm wondering what is causing that.
Edit: It's this line which stop the process: https://github.com/kubernetes/kubernetes/blob/f0823c0f59d6ea8e2ad0dc4e3fe0f2b396c9185f/test/e2e/kubectl/portforward.go#L324