manifest generation is costly when registry does not support Range requests
freedge opened this issue · comments
Description of Problem / Feature Request
when generating a manifest, a HTTP query with header Range: bytes=0-0
is sent for each layer to the registry.
This is not efficient when the server hosting the blob does not support those Range requests.
This is also not needed as far as I could understand the code - https://github.com/quay/clair/blob/v4.7.3/cmd/clairctl/manifest.go#L144-L145
the response (even its error code) is never checked, but the sent headers can be conveniently retrieved from res
. The query is sent out of pure convenience.
Same logic is implemented in stackrox.
In my environment I can reproduce with just
$ time ~/clairctl-linux-amd64 manifest ${MYIMAGE} > /dev/null
real 0m6.887s
user 0m0.111s
sys 0m0.063s
Expected Outcome
manifest generation is fast and works for servers not supporting Range requests
Actual Outcome
for servers not supporting Range request that host blobs, the query is not efficient and useless as it ends up fetching the whole blob.
Environment
- Clair version/image: clair main branch
- Clair client name/version: clairctl version v4.7.3 (claircore v1.5.25)
- Host OS: centos stream9
- Kernel (e.g.
uname -a
): Linux stream 5.14.0-402.frigo.el9.x86_64 #1 SMP PREEMPT_DYNAMIC Sun Dec 24 09:47:43 CET 2023 x86_64 x86_64 x86_64 GNU/Linux - Kubernetes version (use
kubectl version
): na - Network/Firewall setup: na
If you look a few lines below, the body is not read at all. If the server doesn't support range requests, then the best we can do is not read any of the body.
The request is needed to follow any redirection chain that the server sends to a client. Perhaps we need to hook the redirection logic to make sure that the range header is used every time.
The assertion the query is "pure convenience" is wrong.
thanks. Argh looking at the code and its behavior on my server I was hoping it was possible to just not send the query at all and I was wrong.
Well then I don't know if there is much hope for this ticket as there is no easy fix. Fixing the server should be the way to go.
Do you think it would be possible to warn the user in case the range requests is not supported? something like
diff --git a/cmd/clairctl/manifest.go b/cmd/clairctl/manifest.go
index ce9ceed..966703b 100644
--- a/cmd/clairctl/manifest.go
+++ b/cmd/clairctl/manifest.go
@@ -3,6 +3,7 @@
import (
"context"
"errors"
+ "fmt"
"net/http"
"net/url"
"os"
@@ -147,6 +148,9 @@ func Inspect(ctx context.Context, r string) (*claircore.Manifest, error) {
return nil, err
}
res.Body.Close()
+ if res.StatusCode != 206 {
+ zlog.Warn(ctx).Msg(fmt.Sprintf("Server returned `%d' when retrieving layer `%s' and might not support range requests", res.StatusCode, u.String()))
+ }
res.Request.Header.Del("User-Agent")
res.Request.Header.Del("Range")
so that the issue is easier to troubleshoot (ideally should be done on Stackrox side too). I think we can close this ticket then, many thanks for taking a look!
Okay, I created a dedicated issue to add a log line.
Feel free to submit a patch like the above; the only change I'd want is to use the structured logging instead of a formatted message string.