Extremely slow build due to creating many layers in Dockerfile

Question

Extremely slow build due to creating many layers in Dockerfile

nyanpasu64 opened this issue 2 years ago · comments

Your Dockerfile currently contains 13 steps. Because every step/instruction is saved as a layer (link I think), every step adds an overhead of 1 second spent snapshotting the filesystem (link to timestamped logs). And worse yet, these layers are discarded when the GitHub Actions build completes, and when I push an updated Zola site to GitHub, it recomputes the entire Docker image from scratch!

One solution is to remove as many steps as possible from the Dockerfile. I took it one step further and removed my custom Dockerfile altogether, instead relying on a custom entrypoint pointing within my repository clone. Since I had vendored this action in my repo, I decided to mimic Zola's GitLab Pages docs (link), and changed the action.yaml to say:

runs:
  using: 'docker'
  image: 'alpine:latest'
  entrypoint: '.github/actions/zola-deploy-action/install.sh'

Note that the entrypoint I used is non-portable, and I don't know how to adapt it to work in an action usable by other repositories.

install.sh installs the necessary packages (bash coreutils git zola), locates $SCRIPTPATH, then runs exec "$SCRIPTPATH/entrypoint.sh". One advantage is that Alpine's apk package manager is faster (2-3 seconds) than apt-get (7 seconds), but it could also be that I don't install wget and that reduces the amount of work apk needs to perform.

Other changes

Additionally I modified entrypoint.sh to remove git submodule update --init --recursive and the surrounding code, and instead modified my own workflow to checkout submodules:

      uses: actions/checkout@v3.0.0
      with:
        submodules: recursive

I'm not sure if this is faster or not.

Shaleen Jain · Answer 1 · Sun Jun 19 2022 15:01:35 GMT+0800 (China Standard Time)

Hi,

I'm not sure what you are proposing here. You do not provide any definite measurements or benchmarks supporting a better or faster alternative, just the fact that there are alternative ways of doing the same thing.

How exactly github actions builds, reuses or caches a docker image is dependent on their internal implementation for which there are no docs or recommendations provided by the github team last time I checked.

Since zola is still a relatively new tool with frequent point releases, I'm prioritising correctness and convenience over performance.

nyanpasu64 · Answer 2 · Sun Jun 19 2022 15:14:04 GMT+0800 (China Standard Time)

To flesh out my initial post:

https://github.com/nyanpasu64/zola-site/runs/6945828511?check_suite_focus=true#step:3:8 spends around 23 seconds in Docker alone, with 7 of them executing apt-get (and not constituting Docker overhead).
I have a faster Actions CI setup in my repo at https://github.com/nyanpasu64/zola-site/tree/github-alpine/.github/actions/zola-deploy-action (this branch may be removed in the future). https://github.com/nyanpasu64/zola-site/runs/6946215717?check_suite_focus=true spends no time creating a custom Docker image, and 2 seconds in apk.

Since zola is still a relatively new tool with frequent point releases, I'm prioritising correctness and convenience over performance.

You could keep the Debian base image (or switch to a slightly heavier image with Git and wget preinstalled), and run the locale code (if necessary), apt-get (if you pick a light Debian image), and wgetting Zola from a plain shell script. This should make the Actions runs much faster, without reducing control over the exact Zola version used.

Alternatively (I did not test) you could publish a prebuilt Docker image containing Zola, and use it from the action, so Github can quickly download it and hopefully avoid the time overhead (I don't know if downloading an image still has 1 second of overhead per layer).

Shaleen Jain · Answer 3 · Sun Jun 19 2022 16:10:40 GMT+0800 (China Standard Time)

Are you able to get zola running on an alpine base image? If so I'll happy accept a PR switching the Dockerfile to alpine from debian.

In the interest of transparency and supply chain integrity I'm refraining from pulling a prebuild image from docker hub (especially now with docker hub rate limiting anonymous downloads)

nyanpasu64 · Answer 4 · Sun Jun 19 2022 16:35:46 GMT+0800 (China Standard Time)

Are you more interested in installing Zola from the official repo's releases (like this action now), or Alpine's edge repository (like https://www.getzola.org/documentation/deployment/gitlab-pages/#setting-up-the-gitlab-ci-cd-runner)?

EDIT: Apparently the official Zola builds won't run on Alpine because it lacks libstdc++ (ldd output). So the remaining choices are Alpine Zola, Debian stable-slim with wget (or curl), a heavier Debian image with more programs built-in, or picking a different distro as a base image.

Note that I'm not an expert in Docker maintainability. I do know you can probably cut down build time and overhead by combining adjacent ENV and RUN commands (and COPY if you had more than one), and personally I'd remove MAINTAINER and LABEL for ephemeral images. I decided to move as much setup as possible into a runtime shell script rather than the Docker image itself, but I don't know if that's standard practice or not (it would definitely slow things down if Dockerfile-produced images were cached, but they appear to not be).