mumoshu / kube-airflow

A docker image and kubernetes config files to run Airflow on Kubernetes

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Scheduler defaults to 5 runs, so it goes into a CrashLoopBackoff when deployed

jdavidheiser opened this issue · comments

args: ["scheduler", "-n", "5"]

This causes the file read loop to happen five times, then the scheduler exits. It seems like a strange default setup.

I'm a bit confused why this is set this way - shouldn't the scheduler be looping indefinitely? I'm also seeing the scheduler failing to queue up tasks, same as #19, and I wonder if this is the cause in that case, or something else.

airflow is weird. The whole purpose of this setting is to let the scheduler kills itself periodically to reload DAGs. In kubernetes this does not have a huge impact since it will be restarted automatically, and the whole kill/restart cicle can take a while, but airflow does not do sub seconds precision.

-1 means you can never update your DAG, 1 means scheduler kills itself at every task launch

I feel like it would have less impact in Docker, but with Kube managing the pods it ends up putting the cluster in a not-happy state with backoffs because the exiting script looks like a crash. Thanks for the heads up on the motivation to exit after a few task runs - I'm going to modify the start shell script in my version of the Docker container. I think it makes sense to run the scheduler in a while loop but break if it returns a bad error code, so Kube can still manage those incidents as real crashes.

feel free to submit a pull request. I do have my scheduler restarting regularly, I don't see problems except it takes a few minutes to power on (so delaying next dag start)

The issue that I had with kubernetes is that it tracks the number of restarts, so if you run this application indefinitely you could see large reset numbers over a long period of time which would be a red flag to an administrator that runs "kubectl get pods" on the cluster, unless I am understanding it wrong.

As a solution, maybe this pod could be run as a kubernetes cronjob or kubernetes job.
Change in YAML would be similar to below but have not fully debugged yet.

Would this break the way the scheduler works?

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: scheduler
  labels:
    app: airflow
    tier: scheduler
spec:
  schedule: "*/2 * * * *" #every 5 minutes
  jobTemplate:
    spec:
      template:
        spec:
          restartPolicy: Never
          containers:
          - name: scheduler
            image: <image-location>
            # volumes:
            #     - /localpath/to/dags:/usr/local/airflow/dags
            env:
            - name: AIRFLOW_HOME
              value: "/usr/local/airflow"
            args: ["scheduler", "-n", "5"]

@gsemet How/where did you change the config for the scheduler to restart automatically? I'm not seeing it in airflow.cfg.

@gsemet when scheduler args n != -1 it will restart and then go to CrashLoopBackOff later. You can see it in helm chart