kubeflow / pipelines

Machine Learning Pipelines for Kubeflow

Home Page:https://www.kubeflow.org/docs/components/pipelines/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for non-docker based deployments

saschagrunert opened this issue · comments

Do you think it would be possible to support non-docker based clusters as well? I'm currently checking out the examples and see that they want to mount the docker.sock into the container. We might achieve the same results when using crictl. WDYT?

AFAIK, you can configure Argo to use other executors (e.g. k8sapi, kubelet or pns) in the configmap: https://github.com/argoproj/argo/blob/ca1d5e671519aaa9f38f5f2564eb70c138fadda7/docs/workflow-controller-configmap.yaml#L78. Then pipelines should just work.
Would you like to try it?

AFAIK, you can configure Argo to use other executors (e.g. k8sapi, kubelet or pns) in the configmap: https://github.com/argoproj/argo/blob/ca1d5e671519aaa9f38f5f2564eb70c138fadda7/docs/workflow-controller-configmap.yaml#L78. Then pipelines should just work.
Would you like to try it?

Thanks for the help. I edited the configmap and also restarted the workflow controller pod (which seems not necessary). The config looks like this now:

> kubectl get configmap workflow-controller-configmap -o yaml
apiVersion: v1
data:
  config: |
    {
    executorImage: argoproj/argoexec:v2.3.0,
    artifactRepository:
        {
            s3: {
                bucket: mlpipeline,
                keyPrefix: artifacts,
                endpoint: minio-service.kubeflow:9000,
                insecure: true,
                accessKeySecret: {
                    name: mlpipeline-minio-artifact,
                    key: accesskey
                },
                secretKeySecret: {
                    name: mlpipeline-minio-artifact,
                    key: secretkey
                }
            },
            containerRuntimeExecutor: k8sapi
        }
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2019-07-22T13:56:32Z"
  labels:
    kustomize.component: argo
  name: workflow-controller-configmap
  namespace: kubeflow
  resourceVersion: "1181725"
  selfLink: /api/v1/namespaces/kubeflow/configmaps/workflow-controller-configmap
  uid: 3144d234-101f-4031-94ce-b1aa258bfafd

I also tried kubelet as value, but it still tries to mount the docker socket when running a pipeline:

> kubectl describe pod parallel-pipeline-jdnxw-643249177
...
Events:
  Type     Reason       Age                  From                   Message
  ----     ------       ----                 ----                   -------
  Normal   Scheduled    2m26s                default-scheduler      Successfully assigned kubeflow/parallel-pipeline-jdnxw-643249177 to caasp-node-3
  Warning  FailedMount  23s                  kubelet, caasp-node-3  Unable to mount volumes for pod "parallel-pipeline-jdnxw-643249177_kubeflow(9d937151-c9e3-493a-a7b3-a0870507caa7)": timeout expired waiting for volumes to attach or mount for pod "kubeflow"/"parallel-pipeline-jdnxw-643249177". list of unmounted volumes=[docker-sock]. list of unattached volumes=[podmetadata docker-sock mlpipeline-minio-artifact pipeline-runner-token-dr4dg]
  Warning  FailedMount  18s (x9 over 2m26s)  kubelet, caasp-node-3  MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file

The cluster runs on top of Kubernetes 1.15 and CRI-O 1.15 as container runtime. Is there anything else I can try?

Your containerRuntimeExecutor is inside artifactRepository. It should be outside.

Your containerRuntimeExecutor is inside artifactRepository. It should be outside.

Ah thanks for the hint 🤦‍♂️, now I'm encountering a different set of issues when running the example pipelines:

With pns, every single step issues:

This step is in Error state with this message: failed to save outputs: Failed to determine pid for containerID b6f5119e85788ab25d8979841a5ff064240faeb77180ad28498624a98f0c4059: container may have exited too quickly

With kubelet and k8sapi:

invalid spec: templates.echo.outputs.artifacts.mlpipeline-ui-metadata: kubelet executor does not support outputs from base image layer. must use emptyDir

With pns, every single step issues:

You should probably look at the workflow controller logs and the Wait container logs.

kubelet executor does not support outputs from base image layer. must use emptyDir

This is inconvenient, but can you try to satisfy that requirement? Mount an emptyDir beneath the outputs path using task.add_volume and task.add_volume_mount. See

task
.add_volume(
k8s_client.V1Volume(
name=volume_name,
secret=k8s_client.V1SecretVolumeSource(
secret_name=secret_name,
)
)
)
.add_volume_mount(
k8s_client.V1VolumeMount(
name=volume_name,
mount_path=secret_volume_mount_path,
)
)
as reference (it mounts secret volume though).

With pns, every single step issues:

You should probably look at the workflow controller logs and the Wait container logs.

Okay, If I run the [Sample] Basic - Exit Handler example pipeline with pns, then the workflow-controller pod logs:

time="2019-07-26T06:57:14Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="Updated phase  -> Running" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="DAG node exit-handler-zpxsx (exit-handler-zpxsx) initialized Running" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="All of node exit-handler-zpxsx.echo dependencies [] completed" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="Created pod: exit-handler-zpxsx.echo (exit-handler-zpxsx-3143064195)" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="Pod node exit-handler-zpxsx.echo (exit-handler-zpxsx-3143064195) initialized Pending" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="All of node exit-handler-zpxsx.exit-handler-1 dependencies [] completed" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="DAG node exit-handler-zpxsx.exit-handler-1 (exit-handler-zpxsx-3298695089) initialized Running" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="All of node exit-handler-zpxsx.exit-handler-1.gcs-download dependencies [] completed" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="Created pod: exit-handler-zpxsx.exit-handler-1.gcs-download (exit-handler-zpxsx-3267705207)" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="Pod node exit-handler-zpxsx.exit-handler-1.gcs-download (exit-handler-zpxsx-3267705207) initialized Pending" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:14Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:15Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:15Z" level=info msg="Updating node exit-handler-zpxsx.exit-handler-1.gcs-download (exit-handler-zpxsx-3267705207) message: ContainerCreating"
time="2019-07-26T06:57:15Z" level=info msg="Updating node exit-handler-zpxsx.echo (exit-handler-zpxsx-3143064195) message: ContainerCreating"
time="2019-07-26T06:57:15Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:16Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:19Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:19Z" level=info msg="Updating node exit-handler-zpxsx.echo (exit-handler-zpxsx-3143064195) status Pending -> Running"
time="2019-07-26T06:57:19Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:20Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:20Z" level=info msg="Updating node exit-handler-zpxsx.exit-handler-1.gcs-download (exit-handler-zpxsx-3267705207) status Pending -> Running"
time="2019-07-26T06:57:20Z" level=info msg="Updating node exit-handler-zpxsx.echo (exit-handler-zpxsx-3143064195) status Running -> Error"
time="2019-07-26T06:57:20Z" level=info msg="Updating node exit-handler-zpxsx.echo (exit-handler-zpxsx-3143064195) message: failed to save outputs: Failed to determine pid for containerID baff7ec33f4fd38da1d1246a721f67f36a723a6ecf83228a27512b0d8273ed19: container may have exited too quickly"
time="2019-07-26T06:57:20Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:21Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:21Z" level=info msg="Labeled pod kubeflow/exit-handler-zpxsx-3143064195 completed"
time="2019-07-26T06:57:24Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Updating node exit-handler-zpxsx.exit-handler-1.gcs-download (exit-handler-zpxsx-3267705207) status Running -> Error"
time="2019-07-26T06:57:24Z" level=info msg="Updating node exit-handler-zpxsx.exit-handler-1.gcs-download (exit-handler-zpxsx-3267705207) message: failed to save outputs: Failed to determine pid for containerID f17ef07eedbcc0f0666051a1d64a4ae7293f6bb34513019e0d830b837f51673a: container may have exited too quickly"
time="2019-07-26T06:57:24Z" level=info msg="node exit-handler-zpxsx.exit-handler-1 (exit-handler-zpxsx-3298695089) phase Running -> Error" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="node exit-handler-zpxsx.exit-handler-1 (exit-handler-zpxsx-3298695089) finished: 2019-07-26 06:57:24.436064763 +0000 UTC" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Checking daemoned children of exit-handler-zpxsx-3298695089" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="node exit-handler-zpxsx (exit-handler-zpxsx) phase Running -> Error" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="node exit-handler-zpxsx (exit-handler-zpxsx) finished: 2019-07-26 06:57:24.436217138 +0000 UTC" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Checking daemoned children of exit-handler-zpxsx" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Running OnExit handler: echo" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Created pod: exit-handler-zpxsx.onExit (exit-handler-zpxsx-3148652955)" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Pod node exit-handler-zpxsx.onExit (exit-handler-zpxsx-3148652955) initialized Pending" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:24Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:25Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:25Z" level=info msg="Labeled pod kubeflow/exit-handler-zpxsx-3267705207 completed"
time="2019-07-26T06:57:25Z" level=info msg="Updating node exit-handler-zpxsx.onExit (exit-handler-zpxsx-3148652955) message: ContainerCreating"
time="2019-07-26T06:57:25Z" level=info msg="Running OnExit handler: echo" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:25Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:26Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:26Z" level=info msg="Running OnExit handler: echo" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:26Z" level=info msg="Labeled pod kubeflow/exit-handler-zpxsx-3267705207 completed"
time="2019-07-26T06:57:29Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:29Z" level=info msg="Updating node exit-handler-zpxsx.onExit (exit-handler-zpxsx-3148652955) status Pending -> Running"
time="2019-07-26T06:57:29Z" level=info msg="Running OnExit handler: echo" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:29Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:30Z" level=info msg="Processing workflow" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:30Z" level=info msg="Updating node exit-handler-zpxsx.onExit (exit-handler-zpxsx-3148652955) status Running -> Error"
time="2019-07-26T06:57:30Z" level=info msg="Updating node exit-handler-zpxsx.onExit (exit-handler-zpxsx-3148652955) message: failed to save outputs: Failed to determine pid for containerID 15a2919676d6228aaf2ffa24446d88dbb06bcc6fefb0337c6ec4da716c09ef56: container may have exited too quickly"
time="2019-07-26T06:57:30Z" level=info msg="Running OnExit handler: echo" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:30Z" level=info msg="Updated phase Running -> Error" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:30Z" level=info msg="Marking workflow completed" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:30Z" level=info msg="Checking daemoned children of " namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:30Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=exit-handler-zpxsx
time="2019-07-26T06:57:31Z" level=info msg="Labeled pod kubeflow/exit-handler-zpxsx-3148652955 completed"

Whereas the exit handler logs contain:

Pod 1

wait:

time="2019-07-26T06:57:17Z" level=info msg="Creating PNS executor (namespace: kubeflow, pod: exit-handler-zpxsx-3143064195, pid: 9, hasOutputs: true)"
time="2019-07-26T06:57:17Z" level=info msg="Executor (version: v2.3.0, build_date: 2019-05-20T22:10:54Z) initialized (pod: kubeflow/exit-handler-zpxsx-3143064195) with template:\n{\"name\":\"echo\",\"inputs\":{},\"outputs\":{\"artifacts\":[{\"name\":\"mlpipeline-ui-metadata\",\"path\":\"/mlpipeline-ui-metadata.json\",\"optional\":true},{\"name\":\"mlpipeline-metrics\",\"path\":\"/mlpipeline-metrics.json\",\"optional\":true}]},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"library/bash:4.4.23\",\"command\":[\"sh\",\"-c\"],\"args\":[\"echo \\\"$0\\\"\",\"exit!\"],\"resources\":{}},\"archiveLocation\":{\"s3\":{\"endpoint\":\"minio-service.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/exit-handler-zpxsx/exit-handler-zpxsx-3143064195\"}}}"
time="2019-07-26T06:57:17Z" level=info msg="Waiting on main container"
time="2019-07-26T06:57:17Z" level=warning msg="Polling root processes (1m0s)"
time="2019-07-26T06:57:17Z" level=info msg="pid 33: &{root 4096 2147484141 {632376424 63699721037 0x22af420} {1048751 14949805 1 16877 0 0 0 0 4096 4096 8 {1564124237 800376805} {1564124237 632376424} {1564124237 832376878} [0 0 0]}}"
time="2019-07-26T06:57:17Z" level=info msg="Secured filehandle on /proc/33/root"
time="2019-07-26T06:57:17Z" level=info msg="containerID crio-baff7ec33f4fd38da1d1246a721f67f36a723a6ecf83228a27512b0d8273ed19 mapped to pid 33"
time="2019-07-26T06:57:17Z" level=info msg="pid 33: &{root 4096 2147484141 {632376424 63699721037 0x22af420} {1048751 14949805 1 16877 0 0 0 0 4096 4096 8 {1564124237 800376805} {1564124237 632376424} {1564124237 832376878} [0 0 0]}}"
time="2019-07-26T06:57:17Z" level=info msg="pid 33: &{root 4096 2147484141 {632376424 63699721037 0x22af420} {1048751 14949805 1 16877 0 0 0 0 4096 4096 8 {1564124237 800376805} {1564124237 632376424} {1564124237 832376878} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="main container started with container ID: baff7ec33f4fd38da1d1246a721f67f36a723a6ecf83228a27512b0d8273ed19"
time="2019-07-26T06:57:19Z" level=info msg="Starting annotations monitor"
time="2019-07-26T06:57:19Z" level=info msg="Starting deadline monitor"
time="2019-07-26T06:57:19Z" level=error msg="executor error: Failed to determine pid for containerID baff7ec33f4fd38da1d1246a721f67f36a723a6ecf83228a27512b0d8273ed19: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).getContainerPID\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:292\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:166\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:867\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:32\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-26T06:57:19Z" level=info msg="No sidecars"
time="2019-07-26T06:57:19Z" level=info msg="No output parameters"
time="2019-07-26T06:57:19Z" level=info msg="Saving output artifacts"
time="2019-07-26T06:57:19Z" level=info msg="Annotations monitor stopped"
time="2019-07-26T06:57:19Z" level=info msg="Staging artifact: mlpipeline-ui-metadata"
time="2019-07-26T06:57:19Z" level=info msg="Copying /mlpipeline-ui-metadata.json from container base image layer to /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-07-26T06:57:19Z" level=error msg="executor error: could not chroot into main for artifact collection: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).enterChroot\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:94\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).CopyFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:136\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).stageArchiveFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:344\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).saveArtifact\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:245\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:231\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:54\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-26T06:57:19Z" level=info msg="Alloc=3736 TotalAlloc=11943 Sys=70590 NumGC=5 Goroutines=9"
time="2019-07-26T06:57:19Z" level=fatal msg="could not chroot into main for artifact collection: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).enterChroot\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:94\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).CopyFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:136\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).stageArchiveFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:344\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).saveArtifact\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:245\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:231\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:54\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"

main

exit!

Pod 2

wait:

time="2019-07-26T06:57:28Z" level=info msg="Creating PNS executor (namespace: kubeflow, pod: exit-handler-zpxsx-3148652955, pid: 8, hasOutputs: true)"
time="2019-07-26T06:57:28Z" level=info msg="Executor (version: v2.3.0, build_date: 2019-05-20T22:10:54Z) initialized (pod: kubeflow/exit-handler-zpxsx-3148652955) with template:\n{\"name\":\"echo\",\"inputs\":{},\"outputs\":{\"artifacts\":[{\"name\":\"mlpipeline-ui-metadata\",\"path\":\"/mlpipeline-ui-metadata.json\",\"optional\":true},{\"name\":\"mlpipeline-metrics\",\"path\":\"/mlpipeline-metrics.json\",\"optional\":true}]},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"library/bash:4.4.23\",\"command\":[\"sh\",\"-c\"],\"args\":[\"echo \\\"$0\\\"\",\"exit!\"],\"resources\":{}},\"archiveLocation\":{\"s3\":{\"endpoint\":\"minio-service.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/exit-handler-zpxsx/exit-handler-zpxsx-3148652955\"}}}"
time="2019-07-26T06:57:28Z" level=info msg="Waiting on main container"
time="2019-07-26T06:57:28Z" level=warning msg="Polling root processes (1m0s)"
time="2019-07-26T06:57:29Z" level=info msg="pid 32: &{root 274 2147484141 {232317677 63699057718 0x22af420} {65027 96 21 16877 0 0 0 0 274 4096 0 {1563892256 947849310} {1563460918 232317677} {1563460918 232317677} [0 0 0]}}"
time="2019-07-26T06:57:29Z" level=info msg="Secured filehandle on /proc/32/root"
time="2019-07-26T06:57:29Z" level=info msg="containerID crio-15a2919676d6228aaf2ffa24446d88dbb06bcc6fefb0337c6ec4da716c09ef56 mapped to pid 32"
time="2019-07-26T06:57:29Z" level=info msg="pid 32: &{root 4096 2147484141 {480401061 63699721048 0x22af420} {1048741 14949805 1 16877 0 0 0 0 4096 4096 8 {1564124249 60402378} {1564124248 480401061} {1564124249 92402451} [0 0 0]}}"
time="2019-07-26T06:57:29Z" level=info msg="Secured filehandle on /proc/32/root"
time="2019-07-26T06:57:29Z" level=info msg="pid 32: &{root 4096 2147484141 {480401061 63699721048 0x22af420} {1048741 14949805 1 16877 0 0 0 0 4096 4096 8 {1564124249 60402378} {1564124248 480401061} {1564124249 92402451} [0 0 0]}}"
time="2019-07-26T06:57:29Z" level=info msg="pid 32: &{root 4096 2147484141 {480401061 63699721048 0x22af420} {1048741 14949805 1 16877 0 0 0 0 4096 4096 8 {1564124249 60402378} {1564124248 480401061} {1564124249 92402451} [0 0 0]}}"
time="2019-07-26T06:57:29Z" level=info msg="main container started with container ID: 15a2919676d6228aaf2ffa24446d88dbb06bcc6fefb0337c6ec4da716c09ef56"
time="2019-07-26T06:57:29Z" level=info msg="Starting annotations monitor"
time="2019-07-26T06:57:29Z" level=info msg="Annotations monitor stopped"
time="2019-07-26T06:57:29Z" level=error msg="executor error: Failed to determine pid for containerID 15a2919676d6228aaf2ffa24446d88dbb06bcc6fefb0337c6ec4da716c09ef56: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).getContainerPID\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:292\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:166\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:867\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:32\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-26T06:57:29Z" level=info msg="No sidecars"
time="2019-07-26T06:57:29Z" level=info msg="No output parameters"
time="2019-07-26T06:57:29Z" level=info msg="Starting deadline monitor"
time="2019-07-26T06:57:29Z" level=info msg="Deadline monitor stopped"
time="2019-07-26T06:57:29Z" level=info msg="Saving output artifacts"
time="2019-07-26T06:57:29Z" level=info msg="Staging artifact: mlpipeline-ui-metadata"
time="2019-07-26T06:57:29Z" level=info msg="Copying /mlpipeline-ui-metadata.json from container base image layer to /argo/outputs/artifacts/mlpipeline-ui-metadata.tgz"
time="2019-07-26T06:57:29Z" level=error msg="executor error: could not chroot into main for artifact collection: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).enterChroot\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:94\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).CopyFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:136\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).stageArchiveFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:344\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).saveArtifact\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:245\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:231\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:54\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-26T06:57:29Z" level=info msg="Alloc=3943 TotalAlloc=11808 Sys=70334 NumGC=5 Goroutines=8"
time="2019-07-26T06:57:29Z" level=fatal msg="could not chroot into main for artifact collection: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).enterChroot\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:94\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).CopyFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:136\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).stageArchiveFile\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:344\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).saveArtifact\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:245\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:231\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:54\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"

main

exit!

Pod 3

wait:

time="2019-07-26T06:57:19Z" level=info msg="Creating PNS executor (namespace: kubeflow, pod: exit-handler-zpxsx-3267705207, pid: 8, hasOutputs: true)"
time="2019-07-26T06:57:19Z" level=info msg="Executor (version: v2.3.0, build_date: 2019-05-20T22:10:54Z) initialized (pod: kubeflow/exit-handler-zpxsx-3267705207) with template:\n{\"name\":\"gcs-download\",\"inputs\":{\"parameters\":[{\"name\":\"url\",\"value\":\"gs://ml-pipeline-playground/shakespeare1.txt\"}]},\"outputs\":{\"parameters\":[{\"name\":\"gcs-download-data\",\"valueFrom\":{\"path\":\"/tmp/results.txt\"}}],\"artifacts\":[{\"name\":\"mlpipeline-ui-metadata\",\"path\":\"/mlpipeline-ui-metadata.json\",\"optional\":true},{\"name\":\"mlpipeline-metrics\",\"path\":\"/mlpipeline-metrics.json\",\"optional\":true}]},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"google/cloud-sdk:216.0.0\",\"command\":[\"sh\",\"-c\"],\"args\":[\"gsutil cat $0 | tee $1\",\"gs://ml-pipeline-playground/shakespeare1.txt\",\"/tmp/results.txt\"],\"resources\":{}},\"archiveLocation\":{\"s3\":{\"endpoint\":\"minio-service.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/exit-handler-zpxsx/exit-handler-zpxsx-3267705207\"}}}"
time="2019-07-26T06:57:19Z" level=info msg="Waiting on main container"
time="2019-07-26T06:57:19Z" level=warning msg="Polling root processes (1m0s)"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 274 2147484141 {232317677 63699057718 0x22af420} {65027 96 21 16877 0 0 0 0 274 4096 0 {1563892256 947849310} {1563460918 232317677} {1563460918 232317677} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="Secured filehandle on /proc/30/root"
time="2019-07-26T06:57:19Z" level=info msg="containerID crio-f17ef07eedbcc0f0666051a1d64a4ae7293f6bb34513019e0d830b837f51673a mapped to pid 30"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 504380676} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="Secured filehandle on /proc/30/root"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 504380676} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 616380930} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 616380930} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 616380930} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 616380930} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 616380930} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:19Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:20Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:20Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:20Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:20Z" level=info msg="main container started with container ID: f17ef07eedbcc0f0666051a1d64a4ae7293f6bb34513019e0d830b837f51673a"
time="2019-07-26T06:57:20Z" level=info msg="Starting annotations monitor"
time="2019-07-26T06:57:20Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:20Z" level=error msg="executor error: Failed to determine pid for containerID f17ef07eedbcc0f0666051a1d64a4ae7293f6bb34513019e0d830b837f51673a: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).getContainerPID\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:292\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:166\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:867\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:32\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-26T06:57:20Z" level=info msg="No sidecars"
time="2019-07-26T06:57:20Z" level=info msg="Saving output parameters"
time="2019-07-26T06:57:20Z" level=info msg="Saving path output parameter: gcs-download-data"
time="2019-07-26T06:57:20Z" level=info msg="Copying /tmp/results.txt from base image layer"
time="2019-07-26T06:57:20Z" level=error msg="executor error: could not chroot into main for artifact collection: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).enterChroot\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:94\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).GetFileContents\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:79\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveParameters\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:412\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:48\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-26T06:57:20Z" level=info msg="Alloc=3575 TotalAlloc=11754 Sys=70846 NumGC=5 Goroutines=10"
time="2019-07-26T06:57:20Z" level=info msg="Annotations monitor stopped"
time="2019-07-26T06:57:20Z" level=info msg="Starting deadline monitor"
time="2019-07-26T06:57:20Z" level=info msg="Deadline monitor stopped"
time="2019-07-26T06:57:20Z" level=info msg="pid 30: &{root 4096 2147484141 {308380230 63699721039 0x22af420} {1048745 10363959 1 16877 0 0 0 0 4096 4096 8 {1564124239 468380594} {1564124239 308380230} {1564124239 892381557} [0 0 0]}}"
time="2019-07-26T06:57:20Z" level=fatal msg="could not chroot into main for artifact collection: container may have exited too quickly\ngithub.com/argoproj/argo/errors.New\n\t/go/src/github.com/argoproj/argo/errors/errors.go:49\ngithub.com/argoproj/argo/errors.Errorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:55\ngithub.com/argoproj/argo/errors.InternalErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:65\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).enterChroot\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:94\ngithub.com/argoproj/argo/workflow/executor/pns.(*PNSExecutor).GetFileContents\n\t/go/src/github.com/argoproj/argo/workflow/executor/pns/pns.go:79\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveParameters\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:412\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:48\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"

main

With which he yoketh your rebellious necks Razeth your cities and subverts your towns And in a moment makes them desolate

Unfortunately I can't find anything helpful in there, do you? 🤔

kubelet executor does not support outputs from base image layer. must use emptyDir

This is inconvenient, but can you try to satisfy that requirement? Mount an emptyDir beneath the outputs path using task.add_volume and task.add_volume_mount. See

task
.add_volume(
k8s_client.V1Volume(
name=volume_name,
secret=k8s_client.V1SecretVolumeSource(
secret_name=secret_name,
)
)
)
.add_volume_mount(
k8s_client.V1VolumeMount(
name=volume_name,
mount_path=secret_volume_mount_path,
)
)

as reference (it mounts secret volume though).

Hm, I tried to create my own pipeline but the big question is where to mount that empty dir? For now I have something like this, which causes the same issue as mentioned:

#!/usr/bin/env python3

import kfp
from kfp import dsl


def echo_op(text):
    return dsl.ContainerOp(name='echo',
                           image='library/bash:4.4.23',
                           command=['sh', '-c'],
                           arguments=['echo "$0"', text])


@dsl.pipeline(name='My pipeline', description='')
def pipeline():
    from kubernetes import client as k8s_client
    echo_task = echo_op('Hello world').add_volume(
        k8s_client.V1Volume(
            name='volume',
            empty_dir=k8s_client.V1EmptyDirVolumeSource())).add_volume_mount(
                k8s_client.V1VolumeMount(name='volume', mount_path='/output'))


if __name__ == '__main__':
    kfp.compiler.Compiler().compile(pipeline)

where to mount that empty dir?

It should have been mounted to the folder where you're storing the outputs you produce. But in the last example you're not producing any, so there should have been no issues.

Ah. I forgot about the auto-added artifacts (#1422).

Can you try the following two things:

  1. First of all, try some Argo examples (e.g. https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml ) to narrow directly check the lower level compatibility with various execution modes.

  2. In your last example, add the following to the ContainerOp construction:

output_artifact_paths={
  'mlpipeline-ui-metadata': '/output/mlpipeline-ui-metadata.json',
  'mlpipeline-metrics': '/output/mlpipeline-metrics.json',
}

Here we override the paths for the auto-added output artifacts so that they're stored under the /output directory where you've mounter the emptyDir volume.

Can you try the following two things:

  1. First of all, try some Argo examples (e.g. https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml ) to narrow directly check the lower level compatibility with various execution modes.

So I applied the example via argo apply -f and that's the output of the pods logs:

main:

> kubectl logs -f artifact-passing-tmv4v-2138355403 main
 _____________
< hello world >
 -------------
    \
     \
      \
                    ##        .
              ## ## ##       ==
           ## ## ## ##      ===
       /""""""""""""""""___/ ===
  ~~~ {~~ ~~~~ ~~~ ~~~~ ~~ ~ /  ===- ~~~
       \______ o          __/
        \    \        __/
          \____\______/

wait:

> kubectl logs -f artifact-passing-tmv4v-2138355403 wait
time="2019-07-29T06:59:33Z" level=info msg="Creating a K8sAPI executor"
time="2019-07-29T06:59:34Z" level=info msg="Executor (version: v2.3.0, build_date: 2019-05-20T22:10:54Z) initialized (pod: kubeflow/artifact-passing-tmv4v-2138355403) with template:\n{\"name\":\"whalesay\",\"inputs\":{},\"outputs\":{\"artifacts\":[{\"name\":\"hello-art\",\"path\":\"/tmp/hello_world.txt\"}]},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"docker/whalesay:latest\",\"command\":[\"sh\",\"-c\"],\"args\":[\"sleep 1; cowsay hello world | tee /tmp/hello_world.txt\"],\"resources\":{}},\"archiveLocation\":{\"s3\":{\"endpoint\":\"minio-service.kubeflow:9000\",\"bucket\":\"mlpipeline\",\"insecure\":true,\"accessKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"accesskey\"},\"secretKeySecret\":{\"name\":\"mlpipeline-minio-artifact\",\"key\":\"secretkey\"},\"key\":\"artifacts/artifact-passing-tmv4v/artifact-passing-tmv4v-2138355403\"}}}"
time="2019-07-29T06:59:34Z" level=info msg="Waiting on main container"
time="2019-07-29T06:59:34Z" level=error msg="executor error: Failed to establish pod watch: unknown (get pods)\ngithub.com/argoproj/argo/errors.Wrap\n\t/go/src/github.com/argoproj/argo/errors/errors.go:88\ngithub.com/argoproj/argo/errors.InternalWrapErrorf\n\t/go/src/github.com/argoproj/argo/errors/errors.go:78\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).waitMainContainerStart\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:885\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).Wait\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:856\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:32\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-29T06:59:34Z" level=info msg="No sidecars"
time="2019-07-29T06:59:34Z" level=info msg="No output parameters"
time="2019-07-29T06:59:34Z" level=info msg="Saving output artifacts"
time="2019-07-29T06:59:34Z" level=warning msg="Failed to get pod 'artifact-passing-tmv4v-2138355403': pods \"artifact-passing-tmv4v-2138355403\" is forbidden: User \"system:serviceaccount:kubeflow:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"kubeflow\""
time="2019-07-29T06:59:34Z" level=error msg="executor error: pods \"artifact-passing-tmv4v-2138355403\" is forbidden: User \"system:serviceaccount:kubeflow:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"kubeflow\"\ngithub.com/argoproj/argo/errors.Wrap\n\t/go/src/github.com/argoproj/argo/errors/errors.go:88\ngithub.com/argoproj/argo/errors.InternalWrapError\n\t/go/src/github.com/argoproj/argo/errors/errors.go:71\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).getPod\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:620\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).GetMainContainerStatus\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:702\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).GetMainContainerID\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:719\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:220\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:54\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
time="2019-07-29T06:59:34Z" level=info msg="Alloc=3164 TotalAlloc=9686 Sys=70846 NumGC=4 Goroutines=5"
time="2019-07-29T06:59:34Z" level=fatal msg="pods \"artifact-passing-tmv4v-2138355403\" is forbidden: User \"system:serviceaccount:kubeflow:default\" cannot get resource \"pods\" in API group \"\" in the namespace \"kubeflow\"\ngithub.com/argoproj/argo/errors.Wrap\n\t/go/src/github.com/argoproj/argo/errors/errors.go:88\ngithub.com/argoproj/argo/errors.InternalWrapError\n\t/go/src/github.com/argoproj/argo/errors/errors.go:71\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).getPod\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:620\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).GetMainContainerStatus\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:702\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).GetMainContainerID\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:719\ngithub.com/argoproj/argo/workflow/executor.(*WorkflowExecutor).SaveArtifacts\n\t/go/src/github.com/argoproj/argo/workflow/executor/executor.go:220\ngithub.com/argoproj/argo/cmd/argoexec/commands.waitContainer\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:54\ngithub.com/argoproj/argo/cmd/argoexec/commands.NewWaitCommand.func1\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/commands/wait.go:16\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/src/github.com/spf13/cobra/command.go:766\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/src/github.com/spf13/cobra/command.go:852\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/src/github.com/spf13/cobra/command.go:800\nmain.main\n\t/go/src/github.com/argoproj/argo/cmd/argoexec/main.go:17\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:201\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1333"
  1. In your last example, add the following to the ContainerOp construction:
output_artifact_paths={
  'mlpipeline-ui-metadata': '/output/mlpipeline-ui-metadata.json',
  'mlpipeline-metrics': '/output/mlpipeline-metrics.json',
}

Here we override the paths for the auto-added output artifacts so that they're stored under the /output directory where you've mounter the emptyDir volume.

Alright, this seems to work now, the pipeline succeeds.

Alright, this seems to work now, the pipeline succeeds.

I've looked at Argo source code. Maybe you do not even need the emptyDir mount.
Just changing the paths might be enough.

Tracking the original issue
/reopen

Alright, this seems to work now, the pipeline succeeds.

I've looked at Argo source code. Maybe you do not even need the emptyDir mount.
Just changing the paths might be enough.

Hm, no then I get this error message:

invalid spec: templates.analyze-data.outputs.artifacts.mlpipeline-ui-metadata: k8sapi executor does not support outputs from base image layer. must use emptyDir

Hm, no then I get this error message:

Hmm. Maybe it would work if the paths are in an existing base image dir like /tmp (? I wonder whether /tmp is part of the base image) or /home not /output.

Hm, no then I get this error message:

Hmm. Maybe it would work if the paths are in an existing base image dir like /tmp (? I wonder whether /tmp is part of the base image) or /home not /output.

Yes I tried with /tmp, but had no luck either.

I also ran into this issue, and the above fix worked for me for the regular, ContainerOp-style pipelines. When I tried creating a pipeline with func_to_container_op, I also had to add in this bit, as func_to_container_op wanted to store output under /tmp/outputs:

op.add_volume(k8s_client.V1Volume(name='outputs', empty_dir=k8s_client.V1EmptyDirVolumeSource()))
op.container.add_volume_mount(k8s_client.V1VolumeMount(name='outputs', mount_path='/tmp/outputs'))

I also ran into this issue, and the above fix worked for me for the regular, ContainerOp-style pipelines. When I tried creating a pipeline with func_to_container_op, I also had to add in this bit, as func_to_container_op wanted to store output under /tmp/outputs:

op.add_volume(k8s_client.V1Volume(name='outputs', empty_dir=k8s_client.V1EmptyDirVolumeSource()))
op.container.add_volume_mount(k8s_client.V1VolumeMount(name='outputs', mount_path='/tmp/outputs'))

Hey, I now run into similar issues. How to make it work with func_to_container_op? Because something like this does not work for me:

from kfp.components import func_to_container_op

OUT_DIR = '/tmp/outputs'
METADATA_FILE = 'mlpipeline-ui-metadata.json'
METRICS_FILE = 'mlpipeline-metrics.json'
METADATA_FILE_PATH = path.join(OUT_DIR, METADATA_FILE)
METRICS_FILE_PATH = path.join(OUT_DIR, METRICS_FILE)
BASE_IMAGE = 'my-image:latest'

def default_artifact_path() -> Dict[str, str]:
    return {
        path.splitext(METADATA_FILE)[0]: METADATA_FILE_PATH,
        path.splitext(METRICS_FILE)[0]: METRICS_FILE_PATH,
    }

def storage_op(func, *args):
    op = func_to_container_op(func, base_image=BASE_IMAGE)(*args)
    op.output_artifact_paths=default_artifact_path() # I'm not able to overwrite the artifact path here
    op.add_volume(k8s.V1Volume(name='outputs',
                               empty_dir=k8s.V1EmptyDirVolumeSource()))\
      .add_volume_mount(k8s.V1VolumeMount(name='outputs', mount_path=OUT_DIR))
    return op

Good news: The 'mlmetadata-*' artifacts are no longer automatically added to every single pipeline task. (There are still some components that explicitly produce those.)

Side news: All outputs now produce artifacts.

We need to investigate how to make Argo copy the artifacts when using PNS. They should be supporting this, otherwise it's a bug. I need to check the exact criteria for the "emptyDir" error.

BTW, What would be the easiest way to set-up a temporary Docker-less Linux environment?

Good news: The 'mlmetadata-*' artifacts are no longer automatically added to every single pipeline task. (There are still some components that explicitly produce those.)

Side news: All outputs now produce artifacts.

We need to investigate how to make Argo copy the artifacts when using PNS. They should be supporting this, otherwise it's a bug. I need to check the exact criteria for the "emptyDir" error.

BTW, What would be the easiest way to set-up a temporary Docker-less Linux environment?

Sounds good, thanks for the update. I guess an easy way would be the usage of kubeadm with some natively supported distribution, like ubuntu 18.04. Then you could use the project atomic PPA to install CRI-O and bootstrap the node with selecting the crio.sock as runtime endpoint.

@Ark-kun: As far as setting up a Docker-less environment, I ran into this issue while using microk8s, which uses containerd.

@Ark-kun: As far as setting up a Docker-less environment, I ran into this issue while using microk8s, which uses containerd.

Yes having same issue. I've only had success with pipelines going back to microk8s 1.13 and putting manual work around in.

@Ark-kun the lastest stable microk8s 1.15 and higher is docker-less and uses containerd. So snap install microk8s --classic should get you a dockerless kubernetes cluster on ubuntu host.

Just to remind:
So, the first step for Docker-less environment is to tell Argo to use non-Docker executor (e.g. k8sapi, kubelet or pns) in the configmap: https://github.com/argoproj/argo/blob/ca1d5e671519aaa9f38f5f2564eb70c138fadda7/docs/workflow-controller-configmap.yaml#L78.

containerRuntimeExecutor: pns,
or
containerRuntimeExecutor: k8sapi,
or
containerRuntimeExecutor: kubelet,

Also to make debugging faster you could try experimenting on the basic Argo artifact passing pipeline: https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml in terms of paths and the need to attach volumes. I think there should be some directory where manually mounting volumes is not needed.

kubectl get configmap workflow-controller-configmap -o yaml
apiVersion: v1
data:
  config: |
    {
    executorImage: argoproj/argoexec:v2.3.0,
    containerRuntimeExecutor: k8sapi,
    artifactRepository:
        {
            s3: {
                bucket: mlpipeline,
                keyPrefix: artifacts,
                endpoint: minio-service.kubeflow:9000,
                insecure: true,
                accessKeySecret: {
                    name: mlpipeline-minio-artifact,
                    key: accesskey
                },
                secretKeySecret: {
                    name: mlpipeline-minio-artifact,
                    key: secretkey
                }
            }
        }
    }
kind: ConfigMap
metadata:
  creationTimestamp: "2019-07-22T13:56:32Z"
  labels:
    kustomize.component: argo
  name: workflow-controller-configmap
  namespace: kubeflow
  resourceVersion: "1181725"
  selfLink: /api/v1/namespaces/kubeflow/configmaps/workflow-controller-configmap
  uid: 3144d234-101f-4031-94ce-b1aa258bfafd

I updated my Argo run time to try both k8sapi as suggested above and kubelet both failed with same error below when running the sample pipelines.

invalid spec: templates.echo.outputs.artifacts.mlpipeline-ui-metadata: k8sapi executor does not support outputs from base image layer. must use emptyDir

when running the sample pipelines.

Could you try experimenting on the basic Argo artifact passing pipeline: https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml

Just try running it on KFP or Argo.

Uploaded example artifact-passing.yaml as kubeflow pipeline and ran experiment. It failed with similar error as above:

invalid spec: templates.whalesay.outputs.artifacts.hello-art: kubelet executor does not support outputs from base image layer. must use emptyDir

Uploaded example artifact-passing.yaml as kubeflow pipeline and ran experiment. It failed with similar error as above:

invalid spec: templates.whalesay.outputs.artifacts.hello-art: kubelet executor does not support outputs from base image layer. must use emptyDir

/cc @kschroed
Now let's try to change the output directory to a directory that does not exist in the base image (e.g. foobar):

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: artifact-passing-
spec:
  entrypoint: artifact-example
  templates:
  - name: artifact-example
    steps:
    - - name: generate-artifact
        template: whalesay
    - - name: consume-artifact
        template: print-message
        arguments:
          artifacts:
          - name: message
            from: "{{steps.generate-artifact.outputs.artifacts.hello-art}}"

  - name: whalesay
    container:
      image: docker/whalesay:latest
      command: [sh, -c]
      args: ["sleep 1; mkdir -p /foobar; cowsay hello world | tee /foobar/hello_world.txt"]
    outputs:
      artifacts:
      - name: hello-art
        path: /tmp/hello_world.txt

  - name: print-message
    inputs:
      artifacts:
      - name: message
        path: /foobar/message
    container:
      image: alpine:latest
      command: [sh, -c]
      args: ["cat /foobar/message"]

Same error using above.

Same error using above.

OK. I'll file a bug with Argo. It should be supporting getting artifacts with every executor. They explicitly have 3 non-Docker executors.

I faced with the same issue using pns. Kubeflow 0.7 and cri-o engine.
Is there any progress to fix it?

Hey @Ark-kun, may I ask you again what the current status is? I'd be happy to contribute but there was some time in-between. Can we create a list of action items we have to address to get native CRI-O support in kubeflow pipelines? Since PNS has graduated to stable with Kubernetes 1.17 this could be a good alternative.

I would also be willing to assist with this if we could get some action items or the like. Trying to catch up with it as best I can today. I'm going to try and build a 1.15 cluster and see if I see the same issues still as I think there have been at least one releases.

edit: Changing the containerRuntimeExecutor to 'pns' resolved the issue for me, as mentioned in #1654 (comment)

I am still having the issues with Kubeflow built by kfctl 1.0rc1 and this Pipelines Sample https://github.com/kubeflow/pipelines/blob/master/samples/contrib/e2e-mnist/mnist-pipeline.ipynb.

Cluster is Kubernetes 1.15 with containerd engine.

Pods fail with Error:

Warning  FailedMount  2s (x5 over 9m10s)        kubelet, worker-1 Unable to mount volumes for pod "end-to-end-pipeline-gbgz8-1815055382_kubeflow(eba022d5-2bb6-4d40-b760-65b5ab6e1e69)": timeout expired waiting for volumes to attach or mount for pod "kubeflow"/"end-to-end-pipeline-gbgz8-1815055382". list of unmounted volumes=[docker-sock]. list of unattached volumes=[podmetadata docker-sock pipeline-runner-token-zh8g4]
Warning  FailedMount  <invalid> (x14 over 11m)  kubelet, worker-1, MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file

We has opened an issue last year with Argo on PNS executor
argoproj/argo-workflows#1256

I faced with the same issue using pns. Kubeflow 0.7 and cri-o engine.
Is there any progress to fix it?

@y4roslav what's the issue you are facing CRIO executor?

@animeshsingh @Ark-kun I have an issue with PNS executor on OpenShift with cri-o engine.
Reported to argo team: argoproj/argo-workflows#2095
Any run fails with:

This step is in Error state with this message: failed to save outputs: Failed to determine pid for containerID 65e682ea2e3e8102b395f608f15f21b1ab56a021cfe5f1d741c4ed20e463c50a: container may have exited too quickly

even though the component code ends with success.

The execution time of the operation is higher than 15s (limit mentioned at: argoproj/argo-workflows#1256 (comment)).

Failed "wait" container details:

Containers:
  wait:
    Container ID:  cri-o://eb72305d3f9ce0e3d6fd6c7a8fb0509f10e9e933c3022f6e49b14ade67743e9b
    Image:         argoproj/argoexec:v2.4.3
    Image ID:      docker.io/argoproj/argoexec@sha256:d7ab12ccc0c479cb856fa5aa6ab38c8368743f978bcbc4547bd8a67a83eb65f7
    Port:          <none>
    Host Port:     <none>
    Command:
      argoexec
      wait
    State:          Terminated
      Reason:       Error
      Message:      Failed to determine pid for containerID 65e682ea2e3e8102b395f608f15f21b1ab56a021cfe5f1d741c4ed20e463c50a: container may have exited too quickly
      Exit Code:    1
      Started:      Tue, 28 Jan 2020 16:06:32 +0100
      Finished:     Tue, 28 Jan 2020 16:06:38 +0100
    Ready:          False
    Restart Count:  0
    Environment:
      ARGO_POD_NAME:                    kfp-on-wml-training-mnxhk-143671368 (v1:metadata.name)
      ARGO_CONTAINER_RUNTIME_EXECUTOR:  pns

"main" container state:

  main:
    Container ID:  cri-o://65e682ea2e3e8102b395f608f15f21b1ab56a021cfe5f1d741c4ed20e463c50a
    Image:         docker.io/rafalbigaj/ai-kf-config:latest
    Image ID:      docker.io/rafalbigaj/ai-kf-config@sha256:19cf29f2cc0b37a87bb00c14f2ee35bc87d9cb196f50dad735b4c74236b090fb
    Port:          <none>
    Host Port:     <none>
    Command:
      python3
    Args:
      ...
    State:          Terminated
      Reason:       Completed
      Exit Code:    0

Argo workflow-controller logs:

time="2020-01-28T15:06:22Z" level=info msg="Processing workflow" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:22Z" level=info msg="All of node kfp-on-wml-training-mnxhk.create-secret-kubernetes-cluster dependencies [] completed" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:23Z" level=info msg="Created pod: kfp-on-wml-training-mnxhk.create-secret-kubernetes-cluster (kfp-on-wml-training-mnxhk-143671368)" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:23Z" level=info msg="Pod node kfp-on-wml-training-mnxhk.create-secret-kubernetes-cluster (kfp-on-wml-training-mnxhk-143671368) initialized Pending" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:23Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:24Z" level=info msg="Processing workflow" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:24Z" level=info msg="Updating node kfp-on-wml-training-mnxhk.create-secret-kubernetes-cluster (kfp-on-wml-training-mnxhk-143671368) message: ContainerCreating"
time="2020-01-28T15:06:24Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:25Z" level=info msg="Processing workflow" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:31Z" level=info msg="Processing workflow" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:38Z" level=info msg="Processing workflow" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:38Z" level=info msg="Updating node kfp-on-wml-training-mnxhk.create-secret-kubernetes-cluster (kfp-on-wml-training-mnxhk-143671368) status Pending -> Running"
time="2020-01-28T15:06:38Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:39Z" level=info msg="Processing workflow" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:56Z" level=info msg="Processing workflow" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:56Z" level=info msg="Updating node kfp-on-wml-training-mnxhk.create-secret-kubernetes-cluster (kfp-on-wml-training-mnxhk-143671368) status Running -> Error"
time="2020-01-28T15:06:56Z" level=info msg="Updating node kfp-on-wml-training-mnxhk.create-secret-kubernetes-cluster (kfp-on-wml-training-mnxhk-143671368) message: failed to save outputs: Failed to determine pid for containerID 65e682ea2e3e8102b395f608f15f21b1ab56a021cfe5f1d741c4ed20e463c50a: container may have exited too quickly"
time="2020-01-28T15:06:56Z" level=info msg="node kfp-on-wml-training-mnxhk (kfp-on-wml-training-mnxhk) phase Running -> Error" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:56Z" level=info msg="node kfp-on-wml-training-mnxhk (kfp-on-wml-training-mnxhk) finished: 2020-01-28 15:06:56.622434172 +0000 UTC" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:56Z" level=info msg="Checking daemoned children of kfp-on-wml-training-mnxhk" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:56Z" level=info msg="Updated phase Running -> Error" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:56Z" level=info msg="Marking workflow completed" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:56Z" level=info msg="Checking daemoned children of " namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:56Z" level=info msg="Workflow update successful" namespace=kubeflow workflow=kfp-on-wml-training-mnxhk
time="2020-01-28T15:06:57Z" level=info msg="Labeled pod kubeflow/kfp-on-wml-training-mnxhk-143671368 completed"

Logs from "wait" container:

time="2020-01-28T15:06:32Z" level=info msg="Waiting on main container"
time="2020-01-28T15:06:32Z" level=warning msg="Polling root processes (1m0s)"
time="2020-01-28T15:06:37Z" level=info msg="pid 34: &{root 253 2147484141 {518511044 63715334690 0x29189c0} {64515 434110592 12 16877 0 0 0 0 253 4096 0 {1579737916 908000000} {1579737890 518511044} {1579737897 507511044} [0 0 0]}}"
time="2020-01-28T15:06:37Z" level=info msg="Secured filehandle on /proc/34/root"
time="2020-01-28T15:06:37Z" level=info msg="containerID crio-65e682ea2e3e8102b395f608f15f21b1ab56a021cfe5f1d741c4ed20e463c50a.scope mapped to pid 34"
time="2020-01-28T15:06:38Z" level=info msg="pid 34: &{root 28 2147484141 {649301379 63715820797 0x29189c0} {2097394 293675539 1 16877 0 0 0 0 28 4096 0 {1580223997 782300938} {1580223997 649301379} {1580223997 992300242} [0 0 0]}}"
time="2020-01-28T15:06:38Z" level=info msg="Secured filehandle on /proc/34/root"
time="2020-01-28T15:06:38Z" level=info msg="pid 34: &{root 28 2147484141 {649301379 63715820797 0x29189c0} {2097394 293675539 1 16877 0 0 0 0 28 4096 0 {1580223997 782300938} {1580223997 649301379} {1580223997 992300242} [0 0 0]}}"
time="2020-01-28T15:06:38Z" level=info msg="pid 34: &{root 28 2147484141 {649301379 63715820797 0x29189c0} {2097394 293675539 1 16877 0 0 0 0 28 4096 0 {1580223997 782300938} {1580223997 649301379} {1580223997 992300242} [0 0 0]}}"
time="2020-01-28T15:06:38Z" level=info msg="pid 34: &{root 39 2147484141 {649301379 63715820797 0x29189c0} {2097394 293675539 1 16877 0 0 0 0 39 4096 0 {1580223997 782300938} {1580223997 649301379} {1580223998 180299618} [0 0 0]}}"
time="2020-01-28T15:06:38Z" level=info msg="pid 34: &{root 39 2147484141 {649301379 63715820797 0x29189c0} {2097394 293675539 1 16877 0 0 0 0 39 4096 0 {1580223997 782300938} {1580223997 649301379} {1580223998 180299618} [0 0 0]}}"
time="2020-01-28T15:06:38Z" level=info msg="pid 34: &{root 39 2147484141 {649301379 63715820797 0x29189c0} {2097394 293675539 1 16877 0 0 0 0 39 4096 0 {1580223997 782300938} {1580223997 649301379} {1580223998 180299618} [0 0 0]}}"
time="2020-01-28T15:06:38Z" level=info msg="pid 34: &{root 39 2147484141 {649301379 63715820797 0x29189c0} {2097394 293675539 1 16877 0 0 0 0 39 4096 0 {1580223997 782300938} {1580223997 649301379} {1580223998 180299618} [0 0 0]}}"
time="2020-01-28T15:06:38Z" level=info msg="pid 34: &{root 39 2147484141 {649301379 63715820797 0x29189c0} {2097394 293675539 1 16877 0 0 0 0 39 4096 0 {1580223997 782300938} {1580223997 649301379} {1580223998 180299618} [0 0 0]}}"
time="2020-01-28T15:06:38Z" level=info msg="main container started with container ID: 65e682ea2e3e8102b395f608f15f21b1ab56a021cfe5f1d741c4ed20e463c50a"
time="2020-01-28T15:06:38Z" level=info msg="Starting annotations monitor"
time="2020-01-28T15:06:38Z" level=info msg="pid 34: &{root 39 2147484141 {649301379 63715820797 0x29189c0} {2097394 293675539 1 16877 0 0 0 0 39 4096 0 {1580223997 782300938} {1580223997 649301379} {1580223998 180299618} [0 0 0]}}"
time="2020-01-28T15:06:38Z" level=info msg="Starting deadline monitor"
time="2020-01-28T15:06:38Z" level=info msg="pid 34: &{root 39 2147484141 {649301379 63715820797 0x29189c0} {2097394 293675539 1 16877 0 0 0 0 39 4096 0 {1580223997 782300938} {1580223997 649301379} {1580223998 180299618} [0 0 0]}}"
time="2020-01-28T15:06:38Z" level=warning msg="Failed to wait for container id '65e682ea2e3e8102b395f608f15f21b1ab56a021cfe5f1d741c4ed20e463c50a': Failed to determine pid for containerID 65e682ea2e3e8102b395f608f15f21b1ab56a021cfe5f1d741c4ed20e463c50a: container may have exited too quickly"

Since argoproj/argo-workflows#2095 seems closed, @rafalbigaj do you still have issues with running Kubeflow on OpenShift and CRI-O?

@saschagrunert the fixed argo executor works well with Kubeflow on OpenShift when I specify version manually. Thanks

@saschagrunert the fixed argo executor works well with Kubeflow on OpenShift when I specify version manually. Thanks

Thanks, what do you mean with:

when I specify version manually

Do you still change the argo configuration? Do you use PNS?

I can confirm that PNS is now working fine, for example this pipeline runs well with CRI-O:
https://gist.github.com/saschagrunert/3ab126c781ac0e3e93f52591a8bbaf88

We probably should think about making PNS default :)

commented

@saschagrunert : Are you able to visualize the result on Pipeline UI as reference here: https://www.kubeflow.org/docs/pipelines/sdk/output-viewer/#tensorboard

I use CRIO with k8sapi executor, also add emptyDir (/output) as a volume and write the result to /output/mlpipeline-ui-metadata.json, but it does not display on Pipeline Artifact.

I can confirm that PNS is now working fine, for example this pipeline runs well with CRI-O:

Did you upgrade Argo to make it work?

P.S. It would be great if everyone switched from creating instances of ContainerOp directly to using components. See the tutorial for general command-line programs and tutorial for python-based components.

but it does not display on Pipeline Artifact.
Do you see the artifact in Inputs/Outputs tab?

commented

@Ark-kun : In the Input/Output tab, it is empty. In the Artifacts tab, there are 3 types fixed: ROC, TFDB, TABLE

Which is the last compatible Kubeflow version on Openshift 3.11? I want to try running Kubeflow pipelines successfully taking all the learning from this long running post!

Any updates? I'm still seeing this issue. I'm trying out a KF Pipeline example on Kubeflow installed on top of OpenShift. Since the executor here is k8sapi, I get the following error "invalid spec: templates.mypvc.outputs.parameters.mypvc-manifest: k8sapi executor does not support outputs from base image layer. must use emptyDir". Compiling the example (https://github.com/kubeflow/pipelines/blob/master/samples/contrib/volume_ops/volumeop_sequential.py) using dsl-compile, uploading the pipeline and creating an experitment works fine. When I try to run it, I get the above error.

Any updates? I'm still seeing this issue. I'm trying out a KF Pipeline example on Kubeflow installed on top of OpenShift. Since the executor here is k8sapi, I get the following error "invalid spec: templates.mypvc.outputs.parameters.mypvc-manifest: k8sapi executor does not support outputs from base image layer. must use emptyDir". Compiling the example (https://github.com/kubeflow/pipelines/blob/master/samples/contrib/volume_ops/volumeop_sequential.py) using dsl-compile, uploading the pipeline and creating an experitment works fine. When I try to run it, I get the above error.

Can you try the pns executor? I was able to get it to run like this:
https://github.com/kubernetes-analysis/kubernetes-analysis/blob/master/src/pipeline.py

When using CRI-O, then you have to update the argo deployment because there was a bug in conjunction with the container ID retrieval when using the systemd cgroup manager. (Will be fixed in kubeflow/manifests#1145)

@panbalag so many people are able to run it with PNS executor on OpenShift, as mentioned above. Have you tried that? cc @Tomcli

Hi @Ark-kun / @animeshsingh , I recently installed kubeflow kfctl_k8s_istio.v1.0.2.yaml on PKS cluster and try to run pipeline getiing error like
Events:
Type Reason Age From Message


Normal Scheduled 117s default-scheduler Successfully assigned kubeflow/exit-handler-bqlf4-2321901635 to c591098b-4310-48a8-9e52-ab8ce1a14a65
Warning FailedMount 53s (x8 over 117s) kubelet, c591098b-4310-48a8-9e52-ab8ce1a14a65 MountVolume.SetUp failed for volume "docker-sock" : hostPath type check failed: /var/run/docker.sock is not a socket file

and then as per the suggestion of changing containerRuntimeExecutor to pns
in configmap workflow-controller-configmap and try to apply getting error like
Operation cannot be fulfilled on configmaps "workflow-controller-configmap": the object has been modified; please apply your changes to the latest version and try again
could you please suggest me how to address this issue.

Above issue fixed by using force option after changing containerRuntimeExecutor to pns
 kubectl apply -f workflow-controller-configmap-pns -n kubeflow --force

i just tried kubeflow pipelines on a Kubernetes 1.17 (Azure) and 1.18 (minikube) cluster with docker as container engine. I am using the 1.0.0 release https://github.com/kubeflow/pipelines/releases/tag/1.0.0.

I am using the offical example https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml that runs fine out of the box with the argo docker executor.

Then i changed the executor to pns

apiVersion: v1
data:
  config: |
    {
    namespace: kubeflow,
    containerRuntimeExecutor: pns,
    executorImage: gcr.io/ml-pipeline/argoexec:v2.7.5-license-compliance,
    ...

Every pipeline that passes outputs (including the offical example) is now failing.
The problem seems to be that the main container exits properly and the wait container cannot chroot into it anymore:

"executor error: could not chroot into main for artifact collection: container may have exited too quickly"

The docker executor works around this by abusing docker.sock to copy the outputs from the terminated main container which is obviously completely infeasible in production.

The funny thing is that you can manually mount an emptydir under /tmp/outputs and add the proper output path (e.g. tmp/outputs/numbers/data) to op.output_artifact_paths.

def add_emptydir(op):
    from kubernetes import client as k8s_client
    op.add_volume(k8s_client.V1Volume(name='outputs', empty_dir=k8s_client.V1EmptyDirVolumeSource()))
    op.container.add_volume_mount(k8s_client.V1VolumeMount(name='outputs', mount_path='tmp/outputs'))
    op.output_artifact_paths={
        'mlpipeline-ui-metadata': 'tmp/outputs/mlpipeline-ui-metadata.json',
        'mlpipeline-metrics': 'tmp/outputs/mlpipeline-metrics.json',
        'extract-as-artifact': 'tmp/outputs/numbers/data',
    }
    return op

Then the output file (tmp/outputs/numbers/data) is successfully extracted via the mirrored mounts functionality, but extracting the same file with chroot fails.

I also experimented with op.file_outputs without success.
I also experimented with emptydir and the k8sapi executor without success.
I tried newer argo workflow and exec images (2.8.3 and 2.9.3 in deployment/workflow-controller) without success.

So i am wondering why pns is working for others.

Next to the offical examples I am also using some very simple pipelines

@func_to_container_op
def write_numbers_1(numbers_path: OutputPath(str), start: int = 0, count: int = 10):
    import time, datetime
    time.sleep(30) # should not be necessary with newer versions of argo
    '''Write numbers to file'''
    print('numbers_path:', numbers_path)
    with open(numbers_path, 'w') as writer:
        for i in range(start, count):
            writer.write(str(i) + '\n')
    print('finished', datetime.datetime.now())

which work perfectly fine with the docker executor and fail miserably with pns.

If you want this to be fixed, please thumbs up this Argo issue argoproj/argo-workflows#2679 and maybe create new issue if you have additional problems.

If you want this to be fixed, please thumbs up this Argo issue argoproj/argo#2679 and maybe create new issue if you have additional problems.

Please thumb up #4645 (comment) and argoproj/argo-workflows#4367 i have got k8sapi working with the leightweight python components too.

This is of much higher priority now, because new Kubernetes versions won't support Docker.
upstream argo issue: argoproj/argo-workflows#4690

Current recommendation is to use pns executor: argoproj/argo-workflows#4690 (comment).
We already have a manifest for this purpose:
https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env/platform-agnostic-pns

It's time to consider switch gcp to pns and verify its stability with our test infra samples.

Not sure if any one meet the problem if you save artifact in /tmp/ like /tmp/outputs/Output/data seems when PNC executor tries to save the parameter or artifacts. It will chdir to main filesystem and then chroot, looks like /tmp/outputs/Output/data can not be seen from wait container.. I see errors like time="level=error msg="executor error: open /tmp/outputs/Output/data: no such file or directory". Does anyone has the same problem?


Update: Seems this problem is only on Debian versions, when I switch component container to latest versions, pns works fine. If anyone have more clues, please let me know

AFAIK, you can configure Argo to use other executors (e.g. k8sapi, kubelet or pns) in the configmap: https://github.com/argoproj/argo/blob/ca1d5e671519aaa9f38f5f2564eb70c138fadda7/docs/workflow-controller-configmap.yaml#L78. Then pipelines should just work.
Would you like to try it?

worked like a charm. Thanks a lot.

AFAIK, you can configure Argo to use other executors (e.g. k8sapi, kubelet or pns) in the configmap: https://github.com/argoproj/argo/blob/ca1d5e671519aaa9f38f5f2564eb70c138fadda7/docs/workflow-controller-configmap.yaml#L78. Then pipelines should just work.
Would you like to try it?

worked like a charm. Thanks a lot.

This PR already tried to switch to PNS #4965. But we are waiting for the argo update to merge this PR. The need to update argo: FYI, #4965 (comment)

/reopen
Due to #5285, we reverted default to docker executor for current release.

We need to stabilize PNS executor preparing for the next release.

@Bobgy: Reopened this issue.

In response to this:

/reopen
Due to #5285, we reverted default to docker executor for current release.

We need to stabilize PNS executor preparing for the next release.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

/reopen
Due to #5285, we reverted default to docker executor for current release.

We need to stabilize PNS executor preparing for the next release.

For the next release you should update to argo 3.1 and use the emissary executor, which works everywhere rootless. https://argoproj.github.io/argo-workflows/workflow-executors/ i already tested it successfully with kubeflow 1.2

Please upvote #5718 if you want to have a proper solution to this bug.

Documentation: https://www.kubeflow.org/docs/components/pipelines/installation/choose-executor/
Issue: #5718

We are now recommending the emissary executor (Alpha, released in KFP 1.7.0), welcome feedbacks!

where to mount that empty dir?

It should have been mounted to the folder where you're storing the outputs you produce. But in the last example you're not producing any, so there should have been no issues.

Ah. I forgot about the auto-added artifacts (#1422).

Can you try the following two things:

  1. First of all, try some Argo examples (e.g. https://github.com/argoproj/argo/blob/master/examples/artifact-passing.yaml ) to narrow directly check the lower level compatibility with various execution modes.
  2. In your last example, add the following to the ContainerOp construction:
output_artifact_paths={
  'mlpipeline-ui-metadata': '/output/mlpipeline-ui-metadata.json',
  'mlpipeline-metrics': '/output/mlpipeline-metrics.json',
}

Here we override the paths for the auto-added output artifacts so that they're stored under the /output directory where you've mounter the emptyDir volume.

How would I do 2. for a functional component?

@zacharymostowsky the instructions you read is outdated. #1654 (comment) is our current recommendations