quay / clair

Vulnerability Static Analysis for Containers

Home Page:https://quay.github.io/clair/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

manifest generation is costly when registry does not support Range requests

freedge opened this issue · comments

Description of Problem / Feature Request

when generating a manifest, a HTTP query with header Range: bytes=0-0 is sent for each layer to the registry.
This is not efficient when the server hosting the blob does not support those Range requests.
This is also not needed as far as I could understand the code - https://github.com/quay/clair/blob/v4.7.3/cmd/clairctl/manifest.go#L144-L145
the response (even its error code) is never checked, but the sent headers can be conveniently retrieved from res. The query is sent out of pure convenience.
Same logic is implemented in stackrox.

In my environment I can reproduce with just

$ time ~/clairctl-linux-amd64  manifest ${MYIMAGE} > /dev/null

real    0m6.887s
user    0m0.111s
sys     0m0.063s

Expected Outcome

manifest generation is fast and works for servers not supporting Range requests

Actual Outcome

for servers not supporting Range request that host blobs, the query is not efficient and useless as it ends up fetching the whole blob.

Environment

  • Clair version/image: clair main branch
  • Clair client name/version: clairctl version v4.7.3 (claircore v1.5.25)
  • Host OS: centos stream9
  • Kernel (e.g. uname -a): Linux stream 5.14.0-402.frigo.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Dec 24 09:47:43 CET 2023 x86_64 x86_64 x86_64 GNU/Linux
  • Kubernetes version (use kubectl version): na
  • Network/Firewall setup: na

If you look a few lines below, the body is not read at all. If the server doesn't support range requests, then the best we can do is not read any of the body.

The request is needed to follow any redirection chain that the server sends to a client. Perhaps we need to hook the redirection logic to make sure that the range header is used every time.

The assertion the query is "pure convenience" is wrong.

thanks. Argh looking at the code and its behavior on my server I was hoping it was possible to just not send the query at all and I was wrong.
Well then I don't know if there is much hope for this ticket as there is no easy fix. Fixing the server should be the way to go.

Do you think it would be possible to warn the user in case the range requests is not supported? something like

diff --git a/cmd/clairctl/manifest.go b/cmd/clairctl/manifest.go
index ce9ceed..966703b 100644
--- a/cmd/clairctl/manifest.go
+++ b/cmd/clairctl/manifest.go
@@ -3,6 +3,7 @@
 import (
        "context"
        "errors"
+       "fmt"
        "net/http"
        "net/url"
        "os"
@@ -147,6 +148,9 @@ func Inspect(ctx context.Context, r string) (*claircore.Manifest, error) {
                        return nil, err
                }
                res.Body.Close()
+               if res.StatusCode != 206 {
+                       zlog.Warn(ctx).Msg(fmt.Sprintf("Server returned `%d' when retrieving layer `%s' and might not support range requests", res.StatusCode, u.String()))
+               }

                res.Request.Header.Del("User-Agent")
                res.Request.Header.Del("Range")

so that the issue is easier to troubleshoot (ideally should be done on Stackrox side too). I think we can close this ticket then, many thanks for taking a look!

Okay, I created a dedicated issue to add a log line.

Feel free to submit a patch like the above; the only change I'd want is to use the structured logging instead of a formatted message string.