project-zot / zot

zot - A scale-out production-ready vendor-neutral OCI-native container image/artifact registry (purely based on OCI Distribution Specification)

Home Page:https://zotregistry.dev

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[Bug]: Repeated requests to remote registry when not able to load catalog

ctr49 opened this issue · comments

zot version

v2.0.2-rc1

Describe the bug

There seems to be a problem with handling errors when a remote registry does not respond "as expctected".
Configuring a registry that throws a 400 on registry/v2/_catalog (not OCI compliant?!) results in repeated requests towards that registry, regardless of retry settings.

To reproduce

  1. Configuration
    Configuring the Chainguard registry in pass-through mode (on demand) in sync extension:
{
    "storage": {
        "rootDirectory": "/var/lib/registry",
        "commit": true
    },
    "http": {
        "address": "0.0.0.0",
        "port": "5000"
    },
    "log": {
        "level": "warn"
    },
    "extensions": {
        "metrics": {},
        "sync": {
            "enable": true,
            "credentialsFile": "./sync-auth-credentials.json",
            "registries": [
                {
                    "urls": [
                        "https://cgr.dev"
                    ],
                    "content": [
                        {
                            "prefix": "**",
                            "destination": "/chainguard-images"
                        }
                    ],
                    "onDemand": true,
                    "maxRetries": 3,
                    "retryDelay": "60m",
                    "pollInterval": "6h",
                    "tlsVerify": true
                }
            ]
        },
        "search": {},
        "scrub": {},
        "lint": {},
        "trust": {},
        "ui": {}
    }
}
  1. Client tool used
    Pre-compiled zot-linux-arm64 (using the provided helm Chart with container from the project's repository)
  2. Seen error
    Seeing about 10 per second of those, probably also stressing the remote end:
{
	"level": "error",
	"component": "scheduler",
	"error": "{\"errors\":[{\"code\":\"NOT_IMPLEMENTED\",\"message\":\"The catalog API is not yet supported.\"}]}\n",
	"generator": "SyncGenerator",
	"goroutine": 32,
	"caller": "zotregistry.dev/zot/pkg/scheduler/scheduler.go:465",
	"time": "2024-02-08T17:22:58.965145576Z",
	"message": "failed to execute generator"
}

Expected behavior

I'd expect zot to honor the 400 message from the upstream registry and not hammering it any further.

Screenshots

No response

Additional context

Not sure if the Chainguard registry is also at fault (but to me, throwing a 400 does not seem entirely wrong), but in any case the error handling on the zot side should prevent a request storm that could easily result in a DoS (on either side).

Thank you for raising this one, you are right, zot should back off in such cases.

I'll let you know when I'll have a fix for this.