containerd / accelerated-container-image

A production-ready remote container image format (overlaybd) and snapshotter based on block-device.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fast pull the full images in parallel without lazy loading

shuaichang opened this issue · comments

What is the version of your Accelerated Container Image

No response

What would you like to be added?

Overlaydb is great at accelerating container image pulling and we've enjoyed the benefit and appreciate all the support from the community!

Why is this needed for Accelerated Container Image?

Problems

The ondemand data transfer and trace based prefetch are great tools, however, we do see another gap that can be filled in between: fast prefetch of all blobs.

The following are the reasons:

  1. For some applications, lazy load would change application behavior. One example would be K8s workload with startup/liveness/ready probes, that before doing lazy pulling, they can start up with no issue. After onboarding to lazy pull, they fail to start as the previous probe period is not long enough. This makes some application hard to adopt OverlayBD without changing config.
  2. It's not easy to observe the overall latency of image pull as the time has been attributed to application startup time. It also introduced new failure mode that previously we won't run application unless image pull is successful. With lazy pull, it could result in runtime IO hang or other errors hard to debug (this is especially hard for different teams owning application and the container/image runtime infra)
  3. Download full blobs can also be fast, only decompression is slow. Given OverlayBD images decompression is super fast. With high concurrency, we were able to saturate the VM bandwidth and download a multi-GB OverlayBD images in several seconds.

I am aware that the trace based prefetch would make this issue much better, but it can be costly to add the trace record CI/CD build system in a large scaled Infra with many dependencies.

Therefore, I feel if OverlayBD has a feature that is between lazy loading and trace based prefetch (let's just call it Prefetch), then it will be a perfect solution without require too much learning curve and courage to adopt (Problem 2 is a pretty big mindset shift that can slow down adoption)

Options

We propose some options here, please feel free to also add

  • Option 1 (what we are trying now): some external_image_puller to pull blobs from registry in parallel, this can be quick fast when VM network is saturated. After which, we put the blobs into registry_cache directory
    • Pros:
      • relatively easy to implement, no OverlayBD side changes required
      • Flexible for users (us) to tune performance
    • Cons:
      • Will it be thread safe as both overlaybd-tcmu and the external_image_puller might write to registry_cache, will overlaybd-tcmu be able to detect new blob caches added by external_image_puller?
      • Not an OverlayBD feature, cannot be reused by the community
      • Is there a good way to validate the integrity of the image?
  • Option 2: OverlayBD support prefetch with parallelism. (OverlayBD already supports rpull --download-blobs for prefetch full image. However, the performance is pretty slow because 1) it performs unnecessary apply, which is part of the containerd pull image library code 2) the blobs are pulls sequentially, which is slow.)
  • To make it fast, if the rpull also support downloads blobs in chunks in parallel and only return success if the image is fully downloaded.

Please feel free to also contribute ideas. Again, we appreciate all the great work from OverlayBD community. By contributing real world use cases and requirement, hopefully, we can also help drive OverlayBD adoptions.

Thanks!

Are you willing to submit PRs to contribute to this feature?

  • Yes, I am willing to implement it.

@shuaichang
I think in general, there are two implementation paths:

  1. in containerd: use rpull --download-blobs for pulling overlaybd images but two improvements needed to be done. one is parallel downloading single block in chunks to speed on download speed. the other is to remove the process of untar/decompression from content store to snapshot.
  2. in overlaybd: use cache type of download, and full speed background download with no delay or full speed prefetch. and also i think external_image_puller is feasible for cache type download that external_image_puller downloads blobs and write into corresponding snapshots directory.