Complex DAG of container set nodes always failed
GlobeFishNG opened this issue · comments
Yongcong Ma commented
Pre-requisites
- I have double-checked my configuration
- I have tested with the
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below. - I have searched existing issues and could not find a match for this bug
- I'd like to contribute the fix myself (see contributing guide)
What happened/what did you expect to happen?
A complex workflow has below structure.
- The outermost layer of the workflow is a dag.
- The dag dependencies are quite complicated. (40+ nodes and several nodes have more than 5 dependencies)
- Each dag node is a container set.
- Each container set has a simple DAG. (prepare-inputs -> main -> prepare-outputs)
The workflow below failed every time. When I changed the inner dag from containers to dag or step groups, it would succeed.
Expectation: Container sets works well as dag/step groups when it is used as node nested in the complex workflow.
What happened:
- The workflows failed with error info below.
&{0x3a297a0 map[namespace:pipeline workflow:pipeline-test-point-simplefied-g5gbq] 2024-04-30 03:46:14.425189512 +0000 UTC m=+158960.631405283 panic <nil> was unable to obtain node for pipeline-test-point-simplefied-g5gbq-2664608158 <nil> <nil> }
- The DAG showed in GUI was wrong with an invalid dependencies tree. For example,
I
andJ
has the same dependencies but they were connected as ifJ
depended onI
, as below.
- So far as I tested, DAGs showed in the Argo GUI were wrong even for some very simple container set. I suspected that if the crash were some how connected to such wrong DAG behavior.
Version
latest (3.5.6)
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
apiVersion: argoproj.io/v1alpha1
kind: WorkflowTemplate
metadata:
name: pipeline-test-point-simplefied
spec:
entrypoint: pipeline
activeDeadlineSeconds: 172800
arguments:
parameters:
- name: runNums
value: '["006"]'
templates:
- name: whalesay # name of the template
container:
image: docker/whalesay
command: [cowsay]
args: ["hello world"]
resources: # limit the resources
limits:
memory: 32Mi
cpu: 100m
- name: whalesay-step-groups
steps:
- - name: prepare-inputs
template: whalesay
- - name: main
template: whalesay
- - name: prepare-outputs
template: whalesay
- name: whalesay-dag
dag:
tasks:
- name: prepare-inputs
template: whalesay
- name: main
depends: prepare-inputs
template: whalesay
- name: prepare-outputs
depends: main
template: whalesay
- name: whalesay-container-set # name of the template
containerSet:
containers:
- name: prepare-inputs
image: docker/whalesay
command: [cowsay]
args: ["hello world"]
resources: # limit the resources
limits:
memory: 32Mi
cpu: 100m
- name: main
dependencies:
- prepare-inputs
image: docker/whalesay
command: [cowsay]
args: ["hello world"]
resources: # limit the resources
limits:
memory: 32Mi
cpu: 100m
- name: prepare-outputs
dependencies:
- main
image: docker/whalesay
command: [cowsay]
args: ["hello world"]
resources: # limit the resources
limits:
memory: 32Mi
cpu: 100m
- name: pipeline
dag:
tasks:
- name: A
template: whalesay-container-set
- name: B
depends: A.Succeeded
template: whalesay-container-set
- name: C
depends: B.Succeeded
when: 'false'
template: whalesay-container-set
- name: D
depends: C.Succeeded
template: whalesay-container-set
- name: E
depends: D.Succeeded
template: whalesay-container-set
- name: F
depends: D.Succeeded
template: whalesay-container-set
- name: G
depends: E.Succeeded && F.Succeeded
template: whalesay-container-set
- name: H
depends: A.Succeeded && G.Omitted
template: whalesay-container-set
- name: I
depends: H.Succeeded
template: whalesay-container-set
- name: J
depends: H.Succeeded
template: whalesay-container-set
- name: K
depends: A.Succeeded && G.Omitted
template: whalesay-container-set
- name: L
depends: B.Succeeded
template: whalesay-container-set
- name: M
depends: B.Succeeded
template: whalesay-container-set
- name: N1
depends: M.Succeeded && G.Omitted
template: whalesay-container-set
- name: O
depends: N1.Succeeded
template: whalesay-container-set
- name: P
depends: O.Succeeded
template: whalesay-container-set
- name: Q
depends: O.Succeeded
template: whalesay-container-set
withParam: '{{workflow.parameters.runNums}}'
- name: R
depends: O.Succeeded
template: whalesay-container-set
- name: T
depends: R.Succeeded
template: whalesay-container-set
- name: S
depends: O.Succeeded
template: whalesay-container-set
- name: U
depends: O.Succeeded
template: whalesay-container-set
- name: V
depends: O.Succeeded
template: whalesay-container-set
- name: W
depends: M.Succeeded && G.Omitted
template: whalesay-container-set
- name: X
depends: Q.Succeeded
template: whalesay-container-set
- name: Y1
depends: A.Succeeded && G.Omitted
template: whalesay-container-set
- name: Z
depends: Y1.Succeeded && Q.Succeeded && R.Succeeded && V.Succeeded && U.Succeeded && S.Succeeded
template: whalesay-container-set
### SSR
- name: SSR-A
depends: A.Succeeded
template: whalesay-container-set
- name: SSR-B
depends: I.Succeeded && J.Succeeded
template: whalesay-container-set
- name: SSR-C
depends: O.Succeeded
template: whalesay-container-set
- name: SSR-D
depends: P.Succeeded
template: whalesay-container-set
- name: SSR-E
depends: P.Succeeded
template: whalesay-container-set
- name: SSR-F
depends: P.Succeeded
template: whalesay-container-set
- name: SSR-G
depends: I.Succeeded && R.Succeeded
template: whalesay-container-set
- name: SSR-H
depends: I.Succeeded && V.Succeeded
template: whalesay-container-set
- name: SSR-I
depends: I.Succeeded && S.Succeeded
template: whalesay-container-set
- name: SSR-J
depends: I.Succeeded && R.Succeeded
template: whalesay-container-set
- name: SSR-K
depends: I.Succeeded && V.Succeeded
template: whalesay-container-set
- name: SSR-L
depends: I.Succeeded && S.Succeeded
template: whalesay-container-set
- name: SSR-M
depends: W.Succeeded
template: whalesay-container-set
- name: SSR-N
depends: W.Succeeded
template: whalesay-container-set
- name: SSR-O
depends: I.Succeeded && Z.Succeeded
template: whalesay-container-set
- name: SSR-P
depends: I.Succeeded && Z.Succeeded
template: whalesay-container-set
- name: SSR-Q
depends: I.Succeeded && Z.Succeeded
template: whalesay-container-set
Logs from the workflow controller
kubectl logs -n argo deploy/workflow-controller | grep ${workflow}
Logs from in your workflow's wait container
kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
Yongcong Ma commented