jupyter / docker-stacks

Ready-to-run Docker images containing Jupyter applications

Home Page:https://jupyter-docker-stacks.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[ENH] - Document correct way to persist conda packages

laundmo opened this issue · comments

What docker image(s) is this feature applicable to?

base-notebook

What change(s) are you proposing?

Documenting a "blessed" way of persisting installed python packages, such that they are not removed after the container is recreated.

Ideally this should be a way which doesn't require persisting the packages automatically installed by the container

How does this affect the user?

As seen in

its not currently obvious how to correctly do this. Finding this issue is also not easy.

a documentation section for this would help users achieve this quickly without a lot of searching, and allow maintainers to choose one way of persisting python packages (entire conda env or just relevant folders? docker host mount or volume?)

Anything else?

No response

I use a docker-compose.yml file

version: '3.7'
services:
  my-service:
    image: jupyter/base-notebook:latest
    volumes:
      - myvolume:/opt/conda
volumes:
  myvolume:

@laundmo if you want to persist packages (mamba/conda/pip/apt), then you should create an inherited image and install them only once while building your Dockerfile.
An example for mamba and pip is available here: https://jupyter-docker-stacks.readthedocs.io/en/latest/using/recipes.html#using-mamba-install-recommended-or-pip-install-in-a-child-docker-image

If you want to persist user files like python scripts, notebooks, text files and so on (created by you), then you should use a mount a volume. This volume might be just a folder on your host machine, or a docker volume (as in the example above).

and allow maintainers to choose one way of persisting python packages (entire conda env or just relevant folders? docker host mount or volume?)

I don't think one way will work for everyone though.
As I said, there are 2 different types of things we want to persist, and they should be treated differently.
The first type (packages) is relatively straightforward (use an inherited image).

Persisting user files is more complicated.

  1. Some users run our containers on local machines and they manually execute docker run and pass -v to mount a host's folder.
  2. Others do the same but in docker-compose.
  3. Some people prefer using docker volumes, which is also fine.
  4. At the same time, people use our images as singleuser images for JupyterHub and then they need to set a config there.

The first three methods are common for all Docker ecosystem, the 4th one is documented in JupyterHub or JupyterHub's spawners, that's why we don't document it here.

Hope this helps.

I'm not asking how to do it. i know i can make a inherited image if there should be more packages from the get go, and i know you can mount in various ways.

that's why this us a documentation issue, not "asking for help".

i wasn't able to find any of these ways document anywhere. there isn't even a mention of the folder to mount (/opt/conda).

i think there should be documentation at least pointing the way, specifically documentation which you can find by searching for things like "jupyter docker stacks persistent packages" or similar.

I didn't immediately PR because I'm not sure whether there's specific details I'm not aware of which make advice like "mount /opt/conda" flawed

@laundmo I will try to document how users can persist their data in a separate FAQ section in a few days.

I didn't immediately PR because I'm not sure whether there's specific details I'm not aware of which make advice like "mount /opt/conda" flawed

I wouldn't recommend mounting /opt/conda.
There are several things that might go wrong, for example:

  1. Sometimes, you might break your conda environment (by removing some files in /opt/conda for example). When you use an inherited image, you just restart it and it restores the environment of the image.
  2. It's not easy to see how this environment was created - it might have been created by you a year ago and you just can't remember some commands you used to make it work for you.
  3. Updating an image might not work well - a new image might have some issues with your mounted /opt/conda. A new image might for example assume that some file/config exists (which will always be true for a new image), while in your mounted image there will be no such file. And then you won't be able to update easily.
  4. It's difficult to back up/distribute such an image. With an inherited image you can push the image to some registry, or you can distribute your dockerfile if you want to.

So, it's gonna work most of the time (especially, if you don't touch it), until you have to modify/update it.
That's why I don't think mounting "/opt/conda" is a good idea.