kudobuilder / kuttl

KUbernetes Test TooL (kuttl)

Home Page:https://kuttl.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Improve newline handling in collectors

erikgb opened this issue · comments

What happened:

While debugging a flake issue in our e2e-test, we used collectors to harvest logs from relevant pods, we think the output could handle newlines (or actually missing newlines) better. Thanks a lot for adding this feature including docs! ❤️

The issue we were debugging, is job pods that are supposed to produce valid JSON output to stdout. For some reason, it does not do so occasionally, as it gets truncated. Here is an example of the relevant collector output:

logger.go:42: 14:29:57 | vulnerability-overflow/0-create | collecting log output for [type==pod,label: workload.statnett.no/name=vuln-app,namespace: image-scanner-jobs]
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | running command: [kubectl logs --prefix -l workload.statnett.no/name=vuln-app -n image-scanner-jobs --all-containers --tail=10]
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-52873-dnp82/trivy] '/usr/local/bin/trivy' -> '/var/run/image-scanner/trivy'
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-52873-dnp82/scan-image]     "severity": "HIGH",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-52873-dnp82/scan-image]     "title": "OpenSSL: OOB read in TS_OBJ_print_bio()",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-52873-dnp82/scan-image]     "vulnerabilityID": "CVE-2016-2180"
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-52873-dnp82/scan-image]   },
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-52873-dnp82/scan-image]   {
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-52873-dnp82/scan-image]     "fixedVersion": "1.0.1t-1+deb8u4",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-52873-dnp82/scan-image]     "installedVersion": "1.0.1t-1+deb8u2",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-52873-dnp82/scan-image]     "pkgName": "openssl",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-52873-dnp82/scan-image]     "primaryURL": "https://avd.aquasec.com/nvd/cve-2016-2181",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-52873-dnp82/scan-image]     "severi[pod/deployment-vuln-app-app-eccf8-8ca3e-xrqxx/trivy] '/usr/local/bin/trivy' -> '/var/run/image-scanner/trivy'
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-8ca3e-xrqxx/scan-image]   {
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-8ca3e-xrqxx/scan-image]     "fixedVersion": "6.2.4, 6.3.3, 6.4.1, 6.5.3, 6.6.1, 6.7.3, 6.8.3, 6.9.7, 6.10.3",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-8ca3e-xrqxx/scan-image]     "installedVersion": "6.5.2",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-8ca3e-xrqxx/scan-image]     "pkgName": "qs",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-8ca3e-xrqxx/scan-image]     "primaryURL": "https://avd.aquasec.com/nvd/cve-2022-24999",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-8ca3e-xrqxx/scan-image]     "severity": "HIGH",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-8ca3e-xrqxx/scan-image]     "title": "express: \"qs\" prototype poisoning causes the hang of the node process",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-8ca3e-xrqxx/scan-image]     "vulnerabilityID": "CVE-2022-24999"
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-8ca3e-xrqxx/scan-image]   }
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-8ca3e-xrqxx/scan-image] ]
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-32023-grsq9/trivy] '/usr/local/bin/trivy' -> '/var/run/image-scanner/trivy'
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-32023-grsq9/scan-image]   {
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-32023-grsq9/scan-image]     "fixedVersion": "6.2.4, 6.3.3, 6.4.1, 6.5.3, 6.6.1, 6.7.3, 6.8.3, 6.9.7, 6.10.3",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-32023-grsq9/scan-image]     "installedVersion": "6.5.2",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-32023-grsq9/scan-image]     "pkgName": "qs",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-32023-grsq9/scan-image]     "primaryURL": "https://avd.aquasec.com/nvd/cve-2022-24999",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-32023-grsq9/scan-image]     "severity": "HIGH",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-32023-grsq9/scan-image]     "title": "express: \"qs\" prototype poisoning causes the hang of the node process",
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-32023-grsq9/scan-image]     "vulnerabilityID": "CVE-2022-24999"
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-32023-grsq9/scan-image]   }
logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-32023-grsq9/scan-image] ]

The collector selector matches three single container pods. Two of them produce valid output, but for the container pod/deployment-vuln-app-app-eccf8-52873-dnp82/scan-image the logs are suddenly truncated in the middle of a JSON property, and the log ends without the standard newline. This results in kuttl logging the following line:

logger.go:42: 14:29:57 | vulnerability-overflow/0-create | [pod/deployment-vuln-app-app-eccf8-52873-dnp82/scan-image]     "severi[pod/deployment-vuln-app-app-eccf8-8ca3e-xrqxx/trivy] '/usr/local/bin/trivy' -> '/var/run/image-scanner/trivy'

And this kuttl log-line does not match the format and structure in the other lines.

What you expected to happen:

Kuttl logs in a consistent format, even if a container log isn't terminated by newline.

How to reproduce it (as minimally and precisely as possible):

I haven't tested this, but I would expect this to be reproducible by collecting a pod log not terminated by newline.

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.26.0
  • KUTTL version (use kubectl kuttl version): 0.15.0
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

@erikgb if you provide a reproducer, I can take a look.
I tried to reproduce the error in my environment yesterday and I was not able to.

@erikgb if you provide a reproducer, I can take a look. I tried to reproduce the error in my environment yesterday and I was not able to.

@iblancasa Thanks, I'll see if I can build a simple reproducer.

With some help from a Rancher/k3s maintainer, I managed to capture the raw container logs on the node:

2023-01-19T20:53:02.001262317Z stdout F   {
2023-01-19T20:53:02.001265017Z stdout F     "fixedVersion": "215-17+deb8u11",
2023-01-19T20:53:02.001267717Z stdout F     "installedVersion": "215-17+deb8u4",
2023-01-19T20:53:02.001270417Z stdout F     "pkgName": "systemd-sysv",
2023-01-19T20:53:02.001273117Z stdout F     "primaryURL": "https://avd.aquasec.com/nvd/cve-2019-3815",
2023-01-19T20:53:02.001275917Z stdout F     "severity": "LOW",
2023-01-19T20:53:02.001278717Z stdout F     "title": "systemd: memory leak in journald-server.c introduced by fix for CVE-2018-16864",
2023-01-19T20:53:02.001281517Z stdout F     "vulnerabilityID": "CVE-2019-3815"
2023-01-19T20:53:02.001284217Z stdout F   },
2023-01-19T20:53:02.001287017Z stdout F   {
2023-01-19T20:53:02.001289717Z stdout F     "installedVersion": "215-17+deb8u4",
2023-01-19T20:53:02.001292517Z stdout F     "pkgName": "systemd-sysv",
2023-01-19T20:53:02.001295217Z stdout F     "primaryURL": "https://avd.aquasec.com/nvd/cve-2020-13529",
2023-01-19T20:53:02.001298017Z stdout F     "severity": "LOW",
2023-01-19T20:53:02.001420918Z stdout P     "title": "systemd: DHCP FORCERENEW authentication not implemented can cause a system running the DHCP client to hav

And that does not look correct, as the last line has a P, which stands for Partial.

How did you try to reproduce? I am thinking this might happen if a pod logs without a newline on it's last line before terminating.

I tried multiple ways. Like creating a pod that doesn't print any new lines and collects the output... the same but terminating the pod while printing the output...

A reproducer will help a lot.