dask / dask-docker

Docker images for dask

Home Page:https://hub.docker.com/u/daskdev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

pyarrow seems to be missing from the images

martinedgefocus opened this issue · comments

This seems fairly fundamental and necessary?
Per discussion here: https://dask.discourse.group/t/missing-pyarrow/796/2 it was suggested I report this here

Thanks for raising an issue @martinedgefocus. As there are a wide variety of use cases where folks use Dask, we try keep the packages included by default in these images minimal but allow users to specify additional dependencies they may need installed though setting environment variables (see https://docs.dask.org/en/stable/deploying-docker.html#extensibility). For example, if you would like pyarrow installed you could set EXTRA_CONDA_PACKAGES="pyarrow"

OK, thanks. I'm new to this I'm afraid.
The environment we set up is an AWS EC2Cluster with automatic sizing per the adapt() mechanism.
Is it still feasible to pass the environment variables through like that to add extra packages?
I can see there's an env_vars param, would that be sufficient?

The environment we set up is an AWS EC2Cluster

Just to confirm, does this mean you're using dask_cloudprovider.aws.EC2Cluster to create your Dask cluster?

I can see there's an env_vars param, would that be sufficient?

Looking at the dask_cloudprovider.aws.EC2Cluster docstring, it appears that env_vars are environment variables passed to the workers. I don't know if that's quite what you're after as it will depend on when those environment variables are set. For this use case, I think you want them to be set when Docker is pulling the image. Maybe docker_args is what you're after (I think @jacobtomlinson will have more insight here)

Yup that is one of the intended uses of env_vars, those variables are passed to the docker run command that happens under the hood on the EC2 instances.

cluster = EC2Cluster(..., env_vars={"EXTRA_CONDA_PACKAGES": "pyarrow"})

I'm going to close this out but please feel free to follow up here if you have more questions about how to do this.