Extremely inefficient

Question

Extremely inefficient

Nefcanto opened this issue 7 months ago · comments

Hi

I wanted to use this docker-cache action and this is my code:

  - name: Cache Docker images
    uses: ScribeMD/docker-cache@0.3.7
    with:
      key: docker-${{ runner.os }}

  - name: Pull images
    run: |
      docker pull holism/panel
      docker pull node:lts-bookworm-slim

But as it's clear from this image, your action made my workflow time to increase from 1:30 up to 2:00 minutes:

The problem is all about docker save and docker load. Both of them are extremely slow. In GitHub Actions it's faster to download images rather than docker load them from cache.

Can you please work on /var/lib/docker/image to skip the docker save and docker load steps?

Because when I use this code:

  - name: Change Docker image directory ownership
    run: sudo chown -R $USER:$USER /var/lib/docker/image

  - name: Cache Docker
    uses: actions/cache@v4
    with:
      path: |
        /var/lib/docker/image
      key: docker-layers-${{ runner.os }}-pull-image

  - name: Pull images
    run: |
      docker pull holism/panel
      docker pull node:lts-bookworm-slim

I get this error:

Post job cleanup.
Warning: EACCES: permission denied, lstat '/var/lib/docker/image'

If you work on this part, on simply putting the /var/lib/docker/image directory back in its place, then we can pass docker save and docker load and it drastically reduces the time. But for now, your action actually worsens the workflow time.

Thank you

Kurt von Laven · Answer 1 · Sun Feb 04 2024 17:48:10 GMT+0800 (China Standard Time)

I do not understand your proposal. You seem to be reporting that your proposed faster alternative implementation doesn't work on account of a permission error, but we would be delighted to improve the performance for our users if that could be addressed. I attempted numerous implementations like the one you proposed which similarly didn't work. That is how I arrived at docker save + docker load. The performance characteristics depend on the available network bandwidth, disk I/O, CPU, Docker image size etc., so it's always wise to measure for your specific use case. Some people want to cache Docker images simply to avoid Docker Hub's rate limits or reduce reliance on third-party networks. If you control where your images are hosted, I suggest hosting them on the GitHub Container Registry (GHCR) and not caching them at all. Please bear in mind that open-source software typically comes without any warranties of fitness for any particular purpose, so your expectations may be met with hostility in other open-source projects, particularly those with more open issues.

Saeed Nemati · Answer 2 · Sun Feb 04 2024 17:57:41 GMT+0800 (China Standard Time)

@Kurt-von-Laven, it's not your fault that docker save and docker load are slow. Main reason for caching is not to bypass rate limits. Universally it's accepted that the main reason for caching is to increase performance.

Now that you have knowledge about how to create GitHub pre-built actions, I suggest that you at least create an option to either select the save/load or restore the /var/lib/docker/image. The first option is not suitable for performance. It's however suitable for bypassing rate limits or other things.

Kurt von Laven · Answer 3 · Sun Feb 04 2024 18:11:37 GMT+0800 (China Standard Time)

Main reason for caching is not to bypass rate limits. Universally it's accepted that the main reason for caching is to increase performance.

In general this is certainly true, but this action doesn't cache arbitrary data, only Docker images, where the use cases I mentioned are very real alternative motivations for caching. I would again consider your tone though since any given project may or may not have been written with your specific use case and needs in mind.

I suggest that you at least create an option to either select the save/load or restore the /var/lib/docker/image. The first option is not suitable for performance.

Again, I lack comprehension of what you are proposing. You have shared a code sample that you state does not work and asked me to implement something I believe to be fundamentally unworkable based on my past experience trying to do the same thing you suggested. We are open to performance improvements if any can be found.

Saeed Nemati · Answer 4 · Sun Feb 04 2024 18:31:26 GMT+0800 (China Standard Time)

@Kurt-von-Laven, thank you for your patience and kind words.

Let me explain in more detail.

I'm not an expert in Docker. As much as I know Docker creates and manages its images in layers, and caches those layers in a default directory located at /var/lib/docker/image.
It's similar to NPM. When Docker wants to pull an image, it checks that directory for cached layers and if they are there, it uses them to form the image locally.
That's the main reason behind Already exists labels that we see when we execute docker pull command.
So, the correct path towards improving performance is to follow the NPM strategy for caching, and cache that directory.

Based on this approach, we can have two options at hand:

Either we should configure the Docker to use another directory for its cached images, to not face the permission problems
Or we should somehow configure the GitHub Actions to be able to give permission to that default directory

If we manage to do either of these two steps, we can significantly improve our pull command performance.

Kurt von Laven · Answer 5 · Mon Feb 05 2024 05:01:52 GMT+0800 (China Standard Time)

Thank you for the more detailed explanation. This action supports both Linux and Windows, but hopefully most of the differences between the two would boil down to the use of different file paths.

I suspect that option 1 will be slower than anticipated on GitHub-hosted runners on account of the multiple GB of cached Docker images that I would expect would need to be copied to a different directory. This might break users with running containers, which I suspect includes anyone who transitively depends on a Docker container action. It may even run some users out of disk space. It is possible that an aggressive implementation could kick off the copy in a pre-step in order to parallelize it, but this would also introduce a significant amount of complexity and possible failure modes. Existing users will be further impacted by such a change if they have large Docker images that they don't cache with this action (e.g., because they build rather than pull them), but I am open to considering it as a major version bump if it is demonstrated to be typically faster in practice and compatible with Docker container actions. What we would ideally want is a way to add an extra Docker data directory, which would be a feature request for Docker.

I have tried option 2 in the past and felt it was a dead end, but I remain open to being proven wrong about this. My understanding is that GitHub Actions is tightly integrated with Docker and doesn't want users messing with it very much. I don't believe Docker was omitted from GitHub's official caching examples by accident. If it weren't, there would never have been a reason to create this action in the first place.

I suspect that even if either of these implementations work now, they will be more brittle to future changes made by GitHub to prevent this level of tampering with the Docker configuration that their own features rely on. rootless-docker restarts the Docker daemon in rootless mode to prevent similar permission errors to the one you encountered. It was broken repeatedly without warning by undocumented changes to GitHub Action's handling of Docker as you can see from the stream of fixes in July 2022. If something similar happened here, we could revert to docker save + docker load in the worst case, so I feel it would be worth the risk if there is a way to improve performance.

Kurt von Laven · Answer 6 · Sun Feb 18 2024 07:43:56 GMT+0800 (China Standard Time)

I am closing this issue for now, but happy to reopen it if an approach to improving performance is found.