Policy on adding new images
mathbunnyru opened this issue · comments
It would be nice to come up with a policy when we add new images and when we don't.
#1936
Another question is - do we allow ourselves to remove some images (stop building new versions) for some image.
An incomplete history
I was thinking that we could extract some historical decisions for reference going onwards and arrived to this as a starting point.
- #1
Addedminimal-notebook
,scipy-notebook
,r-notebook
. At this point, alsoall-spark-notebook
,datascience-notebook
, andjulia-notebook
was considered. - #3
Addedpyspark-notebook
- #5
Addedall-spark-notebook
- #42
Addedminimal-kernel
- #209
Addedbase-notebook
by refactoringminimal-notebook
- #213
Removedminimal-kernel
- #266
Addtensorflow-notebook
- #356
- #444
- #486
- #533
- #693
- #974
- #1196
- #1204
- #1234
- #1327
- #1569
- #1745
- #1825
Addsdocker-stacks-foundation
by refactoringbase-notebook
- #1926
Addedjulia-notebook
- #1936
- #1961
Thanks @consideRatio. This is really helpful.
So, I have a list of a few things we should consider when adding new images (I also express my own opinion here and people might disagree with it and it's ok):
- Popularity. We should not add a separate notebook, which will contain just
YOUR_FAVOURITE_RARE_PACKAGE_HERE
, just because it's someone's favourite package. If this package is really popular, it's good. The software added should be modern and well-supported. - Added value and potential user database growth.
- Consistency with current images. I don't think we should add something completely different, for example, an image with C++ Jupyter Kernel.
- Maintenance and especially build time. Since we're using Docker, what's inside the container doesn't affect our build workflows (most of the time). But some images might be heavy to build, especially since we upload/download images as archives.
- Any special hardware/equipment - I don't think we should be adding
ppc64le
images, for example. On the other side, I'm ok adding GPU images stack, if someone is ready to implement this without sacrificing build time (it's possible, but needs some work to do) and if the license allows us to do so. - We should accept the idea that we can't live in a model where "our images fit 95% of needs". I think the correct model - you should choose an image closest to your needs and then install a few packages on top of this image.
- I think we should stick to the philosophy of using the latest software available. It worked quite well for us.
The message above is my opinion (which might easily change) and not a statement 🙂
Please, feel free ti share your ideas.
With regards to point 5: adding GPU-support is non-trivial because the version of CUDA that you bundle is an extra confounding factor. For example, PyTorch has separate package indexes for CUDA 11.7 and 11.8. Do you bundle cuDDN or not? When the version of CUDA bundled into the docker image is newer than what's supported by the Nvidia drivers on the host system, it will crash, etc.
Thanks, @twalcari, that's a good point.
I think we need a way to build all the images tree in parallel.
And then to be able to tag the whole tree correctly.
For example, we can build the images tree with simple Ubuntu (as now), and also build the images tree for cuda 11.7 (and add some tag prefix like cuda11.7-
for all these images) and cuda 11.8.
And have options for cuDDN as well.
Obviously, we can build all this in parallel.
So, if someone wants to add templating based on the root image, even without adding gpu-support, it seems to be a nice feature to have.
We will also have to take a look at licensing of cuda-based images, I think one of the reasons they were not added was not being able to redistribute the images because of the cuda license (I'm not sure about this one).