[ENH] - Document correct way to persist conda packages
laundmo opened this issue · comments
What docker image(s) is this feature applicable to?
base-notebook
What change(s) are you proposing?
Documenting a "blessed" way of persisting installed python packages, such that they are not removed after the container is recreated.
Ideally this should be a way which doesn't require persisting the packages automatically installed by the container
How does this affect the user?
As seen in
its not currently obvious how to correctly do this. Finding this issue is also not easy.
a documentation section for this would help users achieve this quickly without a lot of searching, and allow maintainers to choose one way of persisting python packages (entire conda env or just relevant folders? docker host mount or volume?)
Anything else?
No response
I use a docker-compose.yml
file
version: '3.7'
services:
my-service:
image: jupyter/base-notebook:latest
volumes:
- myvolume:/opt/conda
volumes:
myvolume:
@laundmo if you want to persist packages (mamba/conda/pip/apt), then you should create an inherited image and install them only once while building your Dockerfile.
An example for mamba
and pip
is available here: https://jupyter-docker-stacks.readthedocs.io/en/latest/using/recipes.html#using-mamba-install-recommended-or-pip-install-in-a-child-docker-image
If you want to persist user files like python scripts, notebooks, text files and so on (created by you), then you should use a mount a volume. This volume might be just a folder on your host machine, or a docker volume (as in the example above).
and allow maintainers to choose one way of persisting python packages (entire conda env or just relevant folders? docker host mount or volume?)
I don't think one way will work for everyone though.
As I said, there are 2 different types of things we want to persist, and they should be treated differently.
The first type (packages) is relatively straightforward (use an inherited image).
Persisting user files is more complicated.
- Some users run our containers on local machines and they manually execute
docker run
and pass-v
to mount a host's folder. - Others do the same but in docker-compose.
- Some people prefer using docker volumes, which is also fine.
- At the same time, people use our images as singleuser images for JupyterHub and then they need to set a config there.
The first three methods are common for all Docker ecosystem, the 4th one is documented in JupyterHub or JupyterHub's spawners, that's why we don't document it here.
Hope this helps.
I'm not asking how to do it. i know i can make a inherited image if there should be more packages from the get go, and i know you can mount in various ways.
that's why this us a documentation issue, not "asking for help".
i wasn't able to find any of these ways document anywhere. there isn't even a mention of the folder to mount (/opt/conda
).
i think there should be documentation at least pointing the way, specifically documentation which you can find by searching for things like "jupyter docker stacks persistent packages" or similar.
I didn't immediately PR because I'm not sure whether there's specific details I'm not aware of which make advice like "mount /opt/conda" flawed
@laundmo I will try to document how users can persist their data in a separate FAQ section in a few days.
I didn't immediately PR because I'm not sure whether there's specific details I'm not aware of which make advice like "mount /opt/conda" flawed
I wouldn't recommend mounting /opt/conda
.
There are several things that might go wrong, for example:
- Sometimes, you might break your conda environment (by removing some files in
/opt/conda
for example). When you use an inherited image, you just restart it and it restores the environment of the image. - It's not easy to see how this environment was created - it might have been created by you a year ago and you just can't remember some commands you used to make it work for you.
- Updating an image might not work well - a new image might have some issues with your mounted
/opt/conda
. A new image might for example assume that some file/config exists (which will always be true for a new image), while in your mounted image there will be no such file. And then you won't be able to update easily. - It's difficult to back up/distribute such an image. With an inherited image you can push the image to some registry, or you can distribute your dockerfile if you want to.
So, it's gonna work most of the time (especially, if you don't touch it), until you have to modify/update it.
That's why I don't think mounting "/opt/conda" is a good idea.