Parsl / parsl

Parsl - a Python parallel scripting library

Home Page:

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Example using kubernetes provider?

jkitchin opened this issue · comments

I am trying to get a Kubernetes provider to work with parsl.

I have a working kubernetes cluster, with kubectl setup. I can setup pods with kubectl, and open shells in them. Based on, I have set this up:

import parsl
from parsl import python_app
from parsl.config import Config
from parsl.providers import KubernetesProvider
from parsl.executors import HighThroughputExecutor

config = Config(

# load the Parsl config

def exc():
    import socket
    return socket.gethostname()


It does run, and it tries to create a pod, but the pod fails, and the logs indicate:

Traceback (most recent call last):
Tue, Oct 17 2023 2:09:37 pm | File "/opt/conda/bin/", line 687, in <module>
Tue, Oct 17 2023 2:09:37 pm | os.makedirs(os.path.join(args.logdir, "block-{}".format(args.block_id), args.uid), exist_ok=True)
Tue, Oct 17 2023 2:09:37 pm | File "/opt/conda/lib/python3.9/", line 215, in makedirs
Tue, Oct 17 2023 2:09:37 pm | makedirs(head, exist_ok=exist_ok)
Tue, Oct 17 2023 2:09:37 pm | File "/opt/conda/lib/python3.9/", line 215, in makedirs
Tue, Oct 17 2023 2:09:37 pm | makedirs(head, exist_ok=exist_ok)
Tue, Oct 17 2023 2:09:37 pm | File "/opt/conda/lib/python3.9/", line 215, in makedirs
Tue, Oct 17 2023 2:09:37 pm | makedirs(head, exist_ok=exist_ok)
Tue, Oct 17 2023 2:09:37 pm | [Previous line repeated 7 more times]
Tue, Oct 17 2023 2:09:37 pm | File "/opt/conda/lib/python3.9/", line 225, in makedirs
Tue, Oct 17 2023 2:09:37 pm | mkdir(name, mode)
Tue, Oct 17 2023 2:09:37 pm | PermissionError: [Errno 13] Permission denied: '/Users'
Tue, Oct 17 2023 2:09:37 pm | /bin/bash: -c: line 4: syntax error near unexpected token `;'
Tue, Oct 17 2023 2:09:37 pm | /bin/bash: -c: line 4: `;'

I can see there is a permission error related to making a directory /Users. I don't see anywhere obvious to change this.

Are there any examples of using Kubernetes with parsl somewhere? (I looked, but did not find anything).


Digging in to the yaml for the pod, I see this, which seems like something isn't right. the logdir is a local directory on my machine, but the kubernetes cluster where the pod is created is a remote cluster, so that logdir won't exist there.   -a Johns-iMac-4.local,, -p 0 -c 2 -m None --poll 10 --task_port=54918 --result_port=54542 --logdir=/Users/jkitchin/example/runinfo/034/PM_HTEX_multinode --block_id=2 --hb_period=30  --hb_threshold=120 --cpu-affinity none --available-accelerators  --start-method spawn

I guess this means something is not setup right in here.

update 2:
It does work if I run it in a pod on the kubernetes cluster. Although it seems to create 4 pods, and they don't close when the job is done. It seems like they should.

Is there a way to make it work remotely?

I have made a smidge of progress getting this to work.

Some prerequisites that weren't obvious:

  1. The Python and parsl versions have to be the same on the local and remote machines.
  2. You have to set a worker_logdir_root in the executor for the remote path.

Here is a minimally working example for me.

import parsl
from parsl import python_app
from parsl.config import Config
from parsl.providers import KubernetesProvider
from parsl.executors import HighThroughputExecutor

import logging

config = Config(
                # this does not work
                #                persistent_volumes=[('shared-scratch', '/home/jovyan/shared-scratch/')]

# load the Parsl config

def exc():
    import os, socket
    return socket.gethostname()

print('done', exc().result())

I can't get persistent volumes to work, I see messages that indicate the name can't be found (30 persistentvolumeclaim "shared-scratch" not found.) This is a volume that I mount in other pods though.

Also for some reason, this makes 4 pods. When nothing goes wrong, one of them terminates (the one that returns from the app), but the other 3 are left running. Every so often, the ones left running seem to error and restart.