Quieter/less frequent wait progress logging for long-running, one-off jobs

Question

Quieter/less frequent wait progress logging for long-running, one-off jobs

GUI opened this issue a year ago · comments

Describe the problem/challenge you have

I'm not sure if this is a common use-case, but I'd like to use kapp for running one-off Kubernetes jobs triggered from our CI system (I know there are perhaps simpler solutions, but kapp still does various nice things for this use-case, and it aligns with the rest of our application deployments). kapp can work just fine for this, but the output logging and "waiting" or "ongoing: reconcile" logging can make the CI logs a bit noisy and harder to digest if you're just wanting to focus on a single job's logs in the CI's log output. Since things are functional as-is, this is purely a "nice to have" to try and make the log output more readable in certain situations.

For example, given this job:

apiVersion: batch/v1
kind: Job
metadata:
  name: example-long-job
  annotations:
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/update-strategy: fallback-on-replace
spec:
  backoffLimit: 0
  template:
    metadata:
      annotations:
        kapp.k14s.io/deploy-logs: for-new
    spec:
      restartPolicy: Never
      containers:
        - name: long-job
          image: alpine:latest
          command: ["/bin/sh", "-c" ]
          args: ["for n in $(seq 35); do date; sleep 2; done; echo 'done'"]

The kapp deploy output might look something like this:

10:37:13AM: ---- applying 1 changes [0/1 done] ----
logs | # waiting for 'example-long-job-xts68 > long-job' logs to become available...
10:37:15AM: update job/example-long-job (batch/v1) namespace: example-job
10:37:15AM: ---- waiting on 1 changes [0/1 done] ----
10:37:15AM: ongoing: reconcile job/example-long-job (batch/v1) namespace: example-job
10:37:15AM:  ^ Waiting to complete (1 active, 0 failed, 0 succeeded)
10:37:15AM:  L ongoing: waiting on pod/example-long-job-xts68 (v1) namespace: example-job
10:37:15AM:     ^ Pending: ContainerCreating
logs | # starting tailing 'example-long-job-xts68 > long-job' logs
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:14 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:16 UTC 2023
10:37:18AM: ongoing: reconcile job/example-long-job (batch/v1) namespace: example-job
10:37:18AM:  ^ Waiting to complete (1 active, 0 failed, 0 succeeded)
10:37:18AM:  L ok: waiting on pod/example-long-job-xts68 (v1) namespace: example-job
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:18 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:20 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:22 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:24 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:26 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:28 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:30 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:32 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:34 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:36 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:38 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:40 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:42 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:44 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:46 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:48 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:50 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:52 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:54 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:56 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:58 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:00 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:02 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:04 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:06 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:08 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:10 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:12 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:14 UTC 2023
10:38:15AM: ---- waiting on 1 changes [0/1 done] ----
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:16 UTC 2023
10:38:18AM: ongoing: reconcile job/example-long-job (batch/v1) namespace: example-job
10:38:18AM:  ^ Waiting to complete (1 active, 0 failed, 0 succeeded)
10:38:18AM:  L ok: waiting on pod/example-long-job-xts68 (v1) namespace: example-job
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:18 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:20 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:22 UTC 2023
logs | example-long-job-xts68 > long-job | done
logs | # container stopped 'example-long-job-xts68 > long-job' logs
logs | # waiting for 'example-long-job-xts68 > long-job' logs to become available...
10:38:31AM: ok: reconcile job/example-long-job (batch/v1) namespace: example-job
10:38:31AM:  ^ Completed
10:38:31AM: ---- applying complete [1/1 done] ----
10:38:31AM: ---- waiting complete [1/1 done] ----

This is all functional, and I know I could get the raw logs from kubectl, so none of this is a huge deal, but when I'm relying on our CI logs for quick debugging where kapp was run from, I just find myself wishing the log output was a bit easier to read and the kapp progress output wasn't intermixed with the job's log output. Now, for more normal deployments involving multiple resources, I like kapp's progress output, so this feature request is perhaps specific to this use-case of running solo, on-off jobs.

Describe the solution you'd like

It would be great if there was some way to either disable or control the frequency of kapp's "waiting" and "ongoing reconcile" logging output so the job's output could be uninterrupted and easier to view/debug. So if there was some flag to control this, then maybe the output would look something like:

10:37:13AM: ---- applying 1 changes [0/1 done] ----
logs | # waiting for 'example-long-job-xts68 > long-job' logs to become available...
10:37:15AM: update job/example-long-job (batch/v1) namespace: example-job
10:37:15AM: ---- waiting on 1 changes [0/1 done] ----
10:37:15AM: ongoing: reconcile job/example-long-job (batch/v1) namespace: example-job
10:37:15AM:  ^ Waiting to complete (1 active, 0 failed, 0 succeeded)
10:37:15AM:  L ongoing: waiting on pod/example-long-job-xts68 (v1) namespace: example-job
10:37:15AM:     ^ Pending: ContainerCreating
logs | # starting tailing 'example-long-job-xts68 > long-job' logs
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:14 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:16 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:18 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:20 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:22 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:24 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:26 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:28 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:30 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:32 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:34 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:36 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:38 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:40 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:42 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:44 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:46 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:48 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:50 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:52 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:54 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:56 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:37:58 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:00 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:02 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:04 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:06 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:08 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:10 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:12 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:14 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:16 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:18 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:20 UTC 2023
logs | example-long-job-xts68 > long-job | Sat Jun 24 16:38:22 UTC 2023
logs | example-long-job-xts68 > long-job | done
logs | # container stopped 'example-long-job-xts68 > long-job' logs
logs | # waiting for 'example-long-job-xts68 > long-job' logs to become available...
10:38:31AM: ok: reconcile job/example-long-job (batch/v1) namespace: example-job
10:38:31AM:  ^ Completed
10:38:31AM: ---- applying complete [1/1 done] ----
10:38:31AM: ---- waiting complete [1/1 done] ----

Or taking it a step further, when I know I'm just running 1 job, it could be even nicer to also have an option to disable some of the log prefix formatting that shows the pod name and container name so I can really just focus on the log output:

10:37:13AM: ---- applying 1 changes [0/1 done] ----
logs | # waiting for 'example-long-job-xts68 > long-job' logs to become available...
10:37:15AM: update job/example-long-job (batch/v1) namespace: example-job
10:37:15AM: ---- waiting on 1 changes [0/1 done] ----
10:37:15AM: ongoing: reconcile job/example-long-job (batch/v1) namespace: example-job
10:37:15AM:  ^ Waiting to complete (1 active, 0 failed, 0 succeeded)
10:37:15AM:  L ongoing: waiting on pod/example-long-job-xts68 (v1) namespace: example-job
10:37:15AM:     ^ Pending: ContainerCreating
logs | # starting tailing 'example-long-job-xts68 > long-job' logs
Sat Jun 24 16:37:14 UTC 2023
Sat Jun 24 16:37:16 UTC 2023
Sat Jun 24 16:37:18 UTC 2023
Sat Jun 24 16:37:20 UTC 2023
Sat Jun 24 16:37:22 UTC 2023
Sat Jun 24 16:37:24 UTC 2023
Sat Jun 24 16:37:26 UTC 2023
Sat Jun 24 16:37:28 UTC 2023
Sat Jun 24 16:37:30 UTC 2023
Sat Jun 24 16:37:32 UTC 2023
Sat Jun 24 16:37:34 UTC 2023
Sat Jun 24 16:37:36 UTC 2023
Sat Jun 24 16:37:38 UTC 2023
Sat Jun 24 16:37:40 UTC 2023
Sat Jun 24 16:37:42 UTC 2023
Sat Jun 24 16:37:44 UTC 2023
Sat Jun 24 16:37:46 UTC 2023
Sat Jun 24 16:37:48 UTC 2023
Sat Jun 24 16:37:50 UTC 2023
Sat Jun 24 16:37:52 UTC 2023
Sat Jun 24 16:37:54 UTC 2023
Sat Jun 24 16:37:56 UTC 2023
Sat Jun 24 16:37:58 UTC 2023
Sat Jun 24 16:38:00 UTC 2023
Sat Jun 24 16:38:02 UTC 2023
Sat Jun 24 16:38:04 UTC 2023
Sat Jun 24 16:38:06 UTC 2023
Sat Jun 24 16:38:08 UTC 2023
Sat Jun 24 16:38:10 UTC 2023
Sat Jun 24 16:38:12 UTC 2023
Sat Jun 24 16:38:14 UTC 2023
Sat Jun 24 16:38:16 UTC 2023
Sat Jun 24 16:38:18 UTC 2023
Sat Jun 24 16:38:20 UTC 2023
Sat Jun 24 16:38:22 UTC 2023
done
logs | # container stopped 'example-long-job-xts68 > long-job' logs
logs | # waiting for 'example-long-job-xts68 > long-job' logs to become available...
10:38:31AM: ok: reconcile job/example-long-job (batch/v1) namespace: example-job
10:38:31AM:  ^ Completed
10:38:31AM: ---- applying complete [1/1 done] ----
10:38:31AM: ---- waiting complete [1/1 done] ----

Anything else you would like to add:

I can currently use the --wait-check-interval with a long interval time to eliminate the intermixed "waiting" or "ongoing: reconcile" logging in the output. However, the problem with that approach is that then kapp doesn't detect the job is actually finished for a long time (even after the job has completed). But I'm maybe slightly confused by how --wait-check-interval affects logging, since the first example here is with the default --wait-check-interval=3s, but it seems like there's actually a 60 second gap in between some of the subsequent "waiting" and "reconcile" logging. I can't figure out where this 60 second interval is coming from, so if there is already some way to control this (without changing the check interval so kapp still detects completed jobs in a timely manner) that I'm just missing, let me know.

And potentially another way to think about this if introducing a separate flag to control the progress log interval (or disable it) seems too specific, then would it make sense to think about having some generic ability to control the logging "level?" For example, if a lot of these progress updates were considered "info" or "notice" level logs, I could maybe decide to opt to only log things at the "warning" level or above in order to eliminate some of this output. I'm not sure how kapp logs this currently, so I'm not sure if that's easier or harder, but maybe another slightly more generic way to think about it.

Thanks!

Vote on this request

This is an invitation to the community to vote on issues, to help us prioritize our backlog. Use the "smiley face" up to the right of this comment to vote.

👍 "I would like to see this addressed as soon as possible"
👎 "There are other more important things to focus on right now"

We are also happy to receive and review Pull Requests if you want to help working on this issue.

Praveen Rewar · Answer 1 · Tue Jun 27 2023 17:56:26 GMT+0800 (China Standard Time)

Hi @GUI! Thank you so much for creating such a detailed issue ❤️
I think the --tty flag would be useful in your case. You can set it to false and then in your CI/CI pipelines (non-terminal output) you should not be seeing the applying/waiting updates. The output would look something like this:

$ kapp deploy -a pod-log -f job.yaml --tty=false -y | cat
default	example-long-job	Job	-	create	-	reconcile	-	-

logs | example-long-job-45jgx > long-job | Tue Jun 27 09:48:46 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:48:48 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:48:50 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:48:52 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:48:54 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:48:56 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:48:58 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:00 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:02 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:04 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:06 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:08 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:10 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:12 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:14 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:16 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:18 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:20 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:22 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:24 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:26 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:28 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:30 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:32 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:34 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:36 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:38 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:40 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:42 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:44 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:46 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:48 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:50 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:52 UTC 2023
logs | example-long-job-45jgx > long-job | Tue Jun 27 09:49:54 UTC 2023
logs | example-long-job-45jgx > long-job | done

(The applying/waiting updates are a core feature of kapp and we don't want to treat them as "logs" 😃 There is a separate debug flag that can be used to get a more verbose output useful for debugging.)

Nick Muerdter · Answer 2 · Tue Jun 27 2023 22:30:15 GMT+0800 (China Standard Time)

@praveenrewar: Ah, sorry I missed that flag and how that could already achieve what I was looking for. I think I'm good then. Thank you!