couler-proj / couler

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Home Page:https://couler-proj.github.io/couler/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Discussion - use only k8s jobs and cronjobs

maroshmka opened this issue · comments

Summary

Hello guys, didnt find a way to reach you, so opening a discussion here.

First of all, very interesting and ambitious project. I've stumbled upon it when I was thinking about the similar thing and checking argo.

My idea is - do we need any other infra / servers / backends to schedule workflows, except a k8s cluster ?

With your design, unification looks great, but still argo server/controller would need to run to execute the workflows. Im not that familiar with argo, so do not know how it is to manage huge infras (im experienced in airflow).

My proposal would be:

  • use only k8s job
  • create cronjob to trigger first job in workflow
  • write small program that all jobs are using as docker image which will allow to set dependencies on each jobs (triggers dependencies if success)

the definition could be either unified or airflow or argo or anything. but the tasks would be translated to k8s jobs + cronjobs in the end.

What do you think are disadvantages of this approach ? Do we need to have separate scheduler or separate models e.g. for workflows ? What would we loose if we would just use k8s jobs.

Happy to discuss this more and hear any feedback from you.

cheers!


Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.

Thanks for reaching out. Using Couler with Argo Workflows (currently the only supported backend) requires experience with k8s as involves deployment/maintenance of Argo controller. There's a ton of documentation to help you get there. Once that's in place, you should be able to easily submit workflows (either job or cron job) using Couler following the examples.

thanks @terrytangyuan !

There is some misunderstandings probably :) My question is why to use argo or any other tool and not just use plain k8s jobs.

Looking at e.g. argo features - https://argoproj.github.io/argo-workflows/examples/ or airflow features, i'm not sure what is the advantage comparing to plain k8s job.

Argo Workflows is k8s-native, which is extremely powerful when you want to fully leverage the underlying k8s cluster and ecosystem. Also check out some of the blog posts here that might be useful for quick comparison. https://github.com/argoproj/argo-workflows#community-blogs-and-presentations

hey guys,

sorry to reopen this close discussion, but didn't have much time to work on it to show my idea clearer. I've squeezed some time and done this.

Here's example simple, ugly, but almost working, implementation of what i've meant - https://github.com/maroshmka/kubedag

There is ~200 LOC right now and it seems to be working, the core 2 things that should be still solved are:

  • execution_date passing
  • checking for completion on existing kube jobs (and only after that listening to events)
  • do we need the airflow db?

After those, it should be able to run a lot of complex workflows, with no infra except kube cluster.

Code is ugly and architecture might be much more robust, but I wanted to scratch something to explore the idea more. Hope it make a bit sense.

Would love to hear any feedback on this from you.

Cheers
hmka