Lighter docker images

Question

Lighter docker images

remicres opened this issue 4 years ago · comments

See how Vincent does it here

Vincent Delbar · Answer 1 · Thu Dec 03 2020 03:04:08 GMT+0800 (China Standard Time)

You could use my method, but there might be a simpler way to do it with the original dockerfile... My script is not "production-ready" (at least regarding the gpu tag) because somewhat brutal, I wasn't able to properly import your container's env into the squashed image, I had to hard code it (see here) because docker is running some low level env management, there is no way to hack it using the container's shell variables !

What it does :

create a new build from the original image with "RUN rm -rf " to delete any tmp and src files...
create a container with it, in order to make a raw export to gz img (here we lose env !)
create a final light image with hard coded env, user and workdir

Quite bad, it requires several builds and some headaches with the env variables.
Moreover we obtain a single layer image, so the download is still very long because docker can't make use of parallel pulls.

Since you already use many multi commands steps (&& ), merging more layers is not a good option because it would be harder to debug...

A good solution could be multi-stage build, so in fact a totally different approach (quite the opposite).
Instead of removing every useless file from the original container, we have to copy any useful file to the final image.
The idea is to create a first build as an intermediate (heavy) image that won't be pushed to the repository (if I'm not misunderstanding). It will remain above 20GB in /var/lib/docker, but only for whom is building.
The environment variables could also be a problem here, but more easily solved because just one Dockerfile (versus 1 + 2 for my script).
Also, we can imagine a build argument which specifies if you want to keep git / src files in the final image (in order to build extra light img or some heavier *-dev img).

Are you working on it right now ? I think I'll try the idea with minimal edits to your Dockerfile. If success I will open a merge request.

While reading the files, I was thinking about ways to reduce the number of lines, especially regarding first commands : you could use an external txt file with every packages names (like a requirements.txt for python), then do "cat deps.txt | xargs apt-get install -y". Same thing for build env, we can imagine some sh file with every variable (then "source env.sh && make tf").

May be you already got those files elsewhere in the repo ?
Or do you prefer to keep every deps and vars inside the Dockerfile ?

Rémi Cresson · Answer 2 · Thu Dec 03 2020 04:19:58 GMT+0800 (China Standard Time)

Hi @vidlb thanks for sharing these ideas!
I am not currently working on that (for now). I like the multi-stage build thing. I believe that the OTB dockers are built like that.
But I must admit that I'm not very good with docker 😉. And I don't know how efficiently it is possible to factorize the common stages shared between the different dockers: for instance, how to share all the stages/instructions related to the OTB build and dependencies installation, which are exactly the same in otbtf2.0:gpu and otbtf2.0:cpu? (it's a bit nightmarish for maintenance) Maybe the solution you came with (the cat deps.txt | xargs apt-get install -y thing) is a way to go. I don't have a well defined list of the dependencies... everything is in the dockerfiles for now!

Feel free to contribute, there is a lot of room for improvement indeed!

I would be happy if you provide something in a merge request 👍

Vincent Delbar · Answer 3 · Thu Dec 03 2020 21:12:12 GMT+0800 (China Standard Time)

Hi,

Yes, there's a way to keep common layers in cache, but in order to do that we need to build OTB first (at least the OTB_DEPENDS target).

So it will a total refac. In that case there a lot of other possible improvements :

one small Dockerfile, where you can control different build types and push multiple tags using the same base, just by modifying --build-arg, TF and OTB branches can also become build args (no more dozens of dockerfile, just one version along your current branch)
you can keep variables related to the TF build in external sh files then easily upgrade versions or debug you build without any modification to the Dockerfile - you may keep the same file across minor version bumps
same thing for apt dependencies in a txt file, using the xargs trick
as I've done for Moringa, you could also control either if you need to build the GUI (will significantly reduce image size ! )

In order to dev and test builds with Docker the easy way, we should keep the master Dockerfile at the root, and most importantly use COPY instead of cloning your repo at each build, so you could dev and build with you local files instead of pulling only your published commits at each build.

This would also allow you to run automatic tests for your release candidate, pipelines etc (but my knowledge will stop here =).

I will try something out this month.
Should I build with OTB 7.2 ?
Still TF 2.1 ? Ubuntu18 ? Cuda 10.1 ? or more ?

Rémi Cresson · Answer 4 · Fri Dec 04 2020 00:16:39 GMT+0800 (China Standard Time)

Hi @vidlb ,
Yes, you can take the OTB 7.2 and the latest stable TF2.X release!
For the GPU image, you can use the Ubuntu docker from NVIDIA, I don't know which version it is, but you can take the same original Ubuntu docker image for the CPU flavor!

Rémi Cresson · Answer 5 · Tue Jan 26 2021 03:16:33 GMT+0800 (China Standard Time)

Closed by #41