Policy on adding new images

Question

Policy on adding new images

mathbunnyru opened this issue 10 months ago · comments

Ayaz Salikhov commented 10 months ago

It would be nice to come up with a policy when we add new images and when we don't.
#1936

Another question is - do we allow ourselves to remove some images (stop building new versions) for some image.

Erik Sundell · Answer 1 · Tue Aug 08 2023 17:02:35 GMT+0800 (China Standard Time)

An incomplete history

I was thinking that we could extract some historical decisions for reference going onwards and arrived to this as a starting point.

#1
Added minimal-notebook, scipy-notebook, r-notebook. At this point, also all-spark-notebook, datascience-notebook, and julia-notebook was considered.
#3
Added pyspark-notebook
#5
Added all-spark-notebook
#42
Added minimal-kernel
#209
Added base-notebook by refactoring minimal-notebook
#213
Removed minimal-kernel
#266
Add tensorflow-notebook
#356
#444
#486
#533
#693
#974
#1196
#1204
#1234
#1327
#1569
#1745
#1825
Adds docker-stacks-foundation by refactoring base-notebook
#1926
Added julia-notebook
#1936
#1961

Ayaz Salikhov · Answer 2 · Wed Aug 09 2023 19:04:09 GMT+0800 (China Standard Time)

Thanks @consideRatio. This is really helpful.

Ayaz Salikhov · Answer 3 · Fri Aug 11 2023 18:01:15 GMT+0800 (China Standard Time)

So, I have a list of a few things we should consider when adding new images (I also express my own opinion here and people might disagree with it and it's ok):

Popularity. We should not add a separate notebook, which will contain just YOUR_FAVOURITE_RARE_PACKAGE_HERE, just because it's someone's favourite package. If this package is really popular, it's good. The software added should be modern and well-supported.
Added value and potential user database growth.
Consistency with current images. I don't think we should add something completely different, for example, an image with C++ Jupyter Kernel.
Maintenance and especially build time. Since we're using Docker, what's inside the container doesn't affect our build workflows (most of the time). But some images might be heavy to build, especially since we upload/download images as archives.
Any special hardware/equipment - I don't think we should be adding ppc64le images, for example. On the other side, I'm ok adding GPU images stack, if someone is ready to implement this without sacrificing build time (it's possible, but needs some work to do) and if the license allows us to do so.
We should accept the idea that we can't live in a model where "our images fit 95% of needs". I think the correct model - you should choose an image closest to your needs and then install a few packages on top of this image.
I think we should stick to the philosophy of using the latest software available. It worked quite well for us.

Ayaz Salikhov · Answer 4 · Fri Aug 11 2023 18:04:01 GMT+0800 (China Standard Time)

The message above is my opinion (which might easily change) and not a statement 🙂
Please, feel free ti share your ideas.

Thijs Walcarius · Answer 5 · Thu Aug 17 2023 22:19:25 GMT+0800 (China Standard Time)

With regards to point 5: adding GPU-support is non-trivial because the version of CUDA that you bundle is an extra confounding factor. For example, PyTorch has separate package indexes for CUDA 11.7 and 11.8. Do you bundle cuDDN or not? When the version of CUDA bundled into the docker image is newer than what's supported by the Nvidia drivers on the host system, it will crash, etc.

Ayaz Salikhov · Answer 6 · Fri Aug 18 2023 16:07:11 GMT+0800 (China Standard Time)

Thanks, @twalcari, that's a good point.

I think we need a way to build all the images tree in parallel.
And then to be able to tag the whole tree correctly.

For example, we can build the images tree with simple Ubuntu (as now), and also build the images tree for cuda 11.7 (and add some tag prefix like cuda11.7- for all these images) and cuda 11.8.
And have options for cuDDN as well.
Obviously, we can build all this in parallel.

So, if someone wants to add templating based on the root image, even without adding gpu-support, it seems to be a nice feature to have.

We will also have to take a look at licensing of cuda-based images, I think one of the reasons they were not added was not being able to redistribute the images because of the cuda license (I'm not sure about this one).