argoproj / argo-workflows

Workflow Engine for Kubernetes

Home Page:https://argo-workflows.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Complex DAG of container set nodes always failed

GlobeFishNG opened this issue · comments

Pre-requisites

  • I have double-checked my configuration
  • I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
  • I have searched existing issues and could not find a match for this bug
  • I'd like to contribute the fix myself (see contributing guide)

What happened/what did you expect to happen?

A complex workflow has below structure.

  1. The outermost layer of the workflow is a dag.
  2. The dag dependencies are quite complicated. (40+ nodes and several nodes have more than 5 dependencies)
  3. Each dag node is a container set.
  4. Each container set has a simple DAG. (prepare-inputs -> main -> prepare-outputs)

The workflow below failed every time. When I changed the inner dag from containers to dag or step groups, it would succeed.

Expectation: Container sets works well as dag/step groups when it is used as node nested in the complex workflow.
What happened:

  • The workflows failed with error info below.
    &{0x3a297a0 map[namespace:pipeline workflow:pipeline-test-point-simplefied-g5gbq] 2024-04-30 03:46:14.425189512 +0000 UTC m=+158960.631405283 panic <nil> was unable to obtain node for pipeline-test-point-simplefied-g5gbq-2664608158 <nil> <nil> }
  • The DAG showed in GUI was wrong with an invalid dependencies tree. For example, I and J has the same dependencies but they were connected as if J depended on I, as below.
    image
    • So far as I tested, DAGs showed in the Argo GUI were wrong even for some very simple container set. I suspected that if the crash were some how connected to such wrong DAG behavior.

Version

latest (3.5.6)

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
  name: pipeline-test-point-simplefied
spec:
  entrypoint: pipeline
  activeDeadlineSeconds: 172800
  arguments:
    parameters:
    - name: runNums
      value: '["006"]'
  templates:
  - name: whalesay # name of the template
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["hello world"]
      resources: # limit the resources
        limits:
          memory: 32Mi
          cpu: 100m
  - name: whalesay-step-groups
    steps:
    - - name: prepare-inputs
        template: whalesay
    - - name: main
        template: whalesay
    - - name: prepare-outputs
        template: whalesay
  - name: whalesay-dag
    dag:
      tasks:
      - name: prepare-inputs
        template: whalesay
      - name: main
        depends: prepare-inputs
        template: whalesay
      - name: prepare-outputs
        depends: main
        template: whalesay
  - name: whalesay-container-set # name of the template
    containerSet:
      containers:
      - name: prepare-inputs
        image: docker/whalesay
        command: [cowsay]
        args: ["hello world"]
        resources: # limit the resources
          limits:
            memory: 32Mi
            cpu: 100m
      - name: main
        dependencies:
        - prepare-inputs
        image: docker/whalesay
        command: [cowsay]
        args: ["hello world"]
        resources: # limit the resources
          limits:
            memory: 32Mi
            cpu: 100m
      - name: prepare-outputs
        dependencies:
        - main
        image: docker/whalesay
        command: [cowsay]
        args: ["hello world"]
        resources: # limit the resources
          limits:
            memory: 32Mi
            cpu: 100m
  - name: pipeline
    dag:
      tasks:
      - name: A
        template: whalesay-container-set
      - name: B
        depends: A.Succeeded
        template: whalesay-container-set
      - name: C
        depends: B.Succeeded
        when: 'false'
        template: whalesay-container-set
      - name: D
        depends: C.Succeeded
        template: whalesay-container-set
      - name: E
        depends: D.Succeeded
        template: whalesay-container-set
      - name: F
        depends: D.Succeeded
        template: whalesay-container-set
      - name: G
        depends: E.Succeeded && F.Succeeded
        template: whalesay-container-set
      - name: H
        depends: A.Succeeded && G.Omitted
        template: whalesay-container-set
      - name: I
        depends: H.Succeeded
        template: whalesay-container-set
      - name: J
        depends: H.Succeeded
        template: whalesay-container-set
      - name: K
        depends: A.Succeeded && G.Omitted
        template: whalesay-container-set
      - name: L
        depends: B.Succeeded
        template: whalesay-container-set
      - name: M
        depends: B.Succeeded
        template: whalesay-container-set
      - name: N1
        depends: M.Succeeded && G.Omitted
        template: whalesay-container-set
      - name: O
        depends: N1.Succeeded
        template: whalesay-container-set
      - name: P
        depends: O.Succeeded
        template: whalesay-container-set
      - name: Q
        depends: O.Succeeded
        template: whalesay-container-set
        withParam: '{{workflow.parameters.runNums}}'
      - name: R
        depends: O.Succeeded
        template: whalesay-container-set
      - name: T
        depends: R.Succeeded
        template: whalesay-container-set
      - name: S
        depends: O.Succeeded
        template: whalesay-container-set
      - name: U
        depends: O.Succeeded
        template: whalesay-container-set
      - name: V
        depends: O.Succeeded
        template: whalesay-container-set
      - name: W
        depends: M.Succeeded && G.Omitted
        template: whalesay-container-set
      - name: X
        depends: Q.Succeeded
        template: whalesay-container-set
      - name: Y1
        depends: A.Succeeded && G.Omitted
        template: whalesay-container-set
      - name: Z
        depends: Y1.Succeeded && Q.Succeeded && R.Succeeded && V.Succeeded && U.Succeeded && S.Succeeded
        template: whalesay-container-set
      ### SSR
      - name: SSR-A
        depends: A.Succeeded
        template: whalesay-container-set
      - name: SSR-B
        depends: I.Succeeded && J.Succeeded
        template: whalesay-container-set

      - name: SSR-C
        depends: O.Succeeded
        template: whalesay-container-set
      - name: SSR-D
        depends: P.Succeeded
        template: whalesay-container-set
      - name: SSR-E
        depends: P.Succeeded
        template: whalesay-container-set
      - name: SSR-F
        depends: P.Succeeded
        template: whalesay-container-set
      - name: SSR-G
        depends: I.Succeeded && R.Succeeded
        template: whalesay-container-set
      - name: SSR-H
        depends: I.Succeeded && V.Succeeded
        template: whalesay-container-set
      - name: SSR-I
        depends: I.Succeeded && S.Succeeded
        template: whalesay-container-set
      - name: SSR-J
        depends: I.Succeeded && R.Succeeded
        template: whalesay-container-set
      - name: SSR-K
        depends: I.Succeeded && V.Succeeded
        template: whalesay-container-set
      - name: SSR-L
        depends: I.Succeeded && S.Succeeded
        template: whalesay-container-set
      - name: SSR-M
        depends: W.Succeeded
        template: whalesay-container-set
      - name: SSR-N
        depends: W.Succeeded
        template: whalesay-container-set
      - name: SSR-O
        depends: I.Succeeded && Z.Succeeded
        template: whalesay-container-set
      - name: SSR-P
        depends: I.Succeeded && Z.Succeeded
        template: whalesay-container-set
      - name: SSR-Q
        depends: I.Succeeded && Z.Succeeded
        template: whalesay-container-set

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded

Logs from the workflow controller

controller.log

Logs from in your workflow's wait container

error.log