rspier / cronjob-label-controller

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

cronjob-label-controller

This repository implements a simple controller for adding a custom label to all cronjobs.

It is is based on sample-controller, but has been stripped down to the bare minimum. It will ensure that any CronJob (and the jobs and pods created by that CronJob) have a (by default) cronjob label matching the name of the source CronJob.

The original (sample-controller) code used a workqueue for the purpose of rate limiting and preventing race conditions. That code is gone. CronJob objects are only updated rarely, which should grately reduces the risk of such issues.

Running

Local Development

# assumes you have a working kubeconfig, not required if operating in-cluster
go build -o cronjob-label-controller .
./cronjob-label-controller -kubeconfig=$HOME/.kube/config

In a cluster

See the sample kubectl configs.

Why

The default metadata defined on CronJobs, the Jobs they create, and the Pods they create don't have enough information to connect them together consistently.

This blog post proposes adding a cronjob label to make it easier to associate them. This controller implements that.

Example Rules

rules:
- record: cronjob:succeeded_at:sum
  expr: |
    label_replace(
        # most recent success time, as long as the job is still kept around in k8s
        MAX BY(exported_namespace, label_cronjob)(
            kube_job_status_completion_time
                * ON(exported_namespace, job_name) GROUP_RIGHT()
            kube_job_labels{label_cronjob!=""}
        )
        OR ON(exported_namespace, label_cronjob)(
        # add back the failures with value 0, and a little bit of weirdness to limit labels
            cronjob:kube_job_status:sum == 0
                + on (exported_namespace, label_cronjob)
            cronjob:kube_job_status:sum==0
        ),
    "cronjob", "$1", "label_cronjob", "(.+)")
- record: cronjob:kube_job_status:sum
  # the last run succeeded (>0) or failed (=0)
  expr: |
    job_cronjob:kube_job_status_succeeded:sum
      OR ON (exported_namespace, label_cronjob)
    job_cronjob:kube_job_status_failed:sum * 0
- record: job_cronjob:kube_job_status_failed:sum
  # jobs where the most recent run failed
  expr: |
    clamp_max(job_cronjob:kube_job_status_start_time:max, 1)
      * ON(exported_namespace, job_name) GROUP_LEFT()
    kube_job_status_failed > 0
- record: job_cronjob:kube_job_status_succeeded:sum
  # jobs where the most recent run succeeded
  expr: |
    clamp_max(job_cronjob:kube_job_status_start_time:max, 1)
      * ON(exported_namespace, job_name) GROUP_LEFT()
    kube_job_status_succeeded > 0
- record: job_cronjob:kube_job_status_start_time:max
  # find the most recently started run (job) for a cronjob.
  expr: |
    label_replace(
        max(
          kube_job_status_start_time
            * ON(exported_namespace, job_name) GROUP_RIGHT()
          kube_job_labels{label_cronjob!=""}
        ) BY (exported_namespace, label_cronjob, job_name)
        == ON(exported_namespace, label_cronjob) GROUP_LEFT()
        max(
          kube_job_status_start_time
            * ON(exported_namespace, job_name) GROUP_RIGHT()
          kube_job_labels{label_cronjob!=""}
        ) BY (exported_namespace, label_cronjob),
    "cronjob", "$1", "label_cronjob", "(.+)")
- alert: KCronJobStale
  expr: |
    cronjob:succeeded_ago:sum{cronjob="somejob",exported_namespace="ns"} > 86400
  for: 10m
  labels:
    severity: email

Future developments

Consider using the controller-runtime project.

About

License:Apache License 2.0


Languages

Language:Go 91.6%Language:Dockerfile 8.4%