error updating nsa compliance report: etcdserver: request is too large

Question

error updating nsa compliance report: etcdserver: request is too large

dirsigler opened this issue 2 years ago · comments

What steps did you take and what happened:

downloaded the source for version v0.15.0 here
ran kubectl apply -f . --force in the deploy/crd/ folder to force apply all CRDs
updated Starboard-Operator via helm

helm upgrade -i starboard-operator aquasecurity/starboard-operator \                                                                                     ✘ 130
  --namespace starboard-system \
  --create-namespace \
  --set="targetNamespaces=" \
  --set="trivy.ignoreUnfixed=true" \
  --set="trivy.serverURL=http://my-trivy-server.trivy-server.svc.cluster.local:4954" \
--set="trivy.mode=ClientServer" \
--set="operator.vulnerabilityScannerScanOnlyCurrentRevisions=true" \
--version 0.10.0

After that the Starboard-Operator Pod spins up, does it's job for 1-2 minutes and starts a Crash loop.
One of the many error messages is:

{"level":"error","ts":1648994664.7263627,"logger":"controller.clustercompliancereport","msg":"Reconciler error","reconciler group":"aquasecurity.github.io","reconciler kind":"ClusterComplianceReport","name":"nsa","namespace":"","error":"etcdserver: request is too large","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/internal/controller/controller.go:227"}
I0403 14:04:26.875818       1 request.go:665] Waited for 1.196154975s due to client-side throttling, not priority and fairness, request: GET:https://10.16.192.1:443/apis/internal.autoscaling.k8s.io/v1alpha1?timeout=32s
{"level":"error","ts":1648994666.9797654,"logger":"controller-runtime.source","msg":"if kind is a CRD, it should be installed before calling Start","kind":"CronJob.batch","error":"no matches for kind \"CronJob\" in version \"batch/v1\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/source/source.go:137\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.23.5/pkg/util/wait/wait.go:233\nk8s.io/apimachinery/pkg/util/wait.WaitForWithContext\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.23.5/pkg/util/wait/wait.go:660\nk8s.io/apimachinery/pkg/util/wait.poll\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.23.5/pkg/util/wait/wait.go:594\nk8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.23.5/pkg/util/wait/wait.go:545\nsigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/source/source.go:131"}
{"level":"error","ts":1648994669.7849312,"logger":"controller-runtime.source","msg":"if kind is a CRD, it should be installed before calling Start","kind":"CronJob.batch","error":"no matches for kind \"CronJob\" in version \"batch/v1\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/source/source.go:137\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.23.5/pkg/util/wait/wait.go:233\nk8s.io/apimachinery/pkg/util/wait.WaitForWithContext\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.23.5/pkg/util/wait/wait.go:660\nk8s.io/apimachinery/pkg/util/wait.poll\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.23.5/pkg/util/wait/wait.go:594\nk8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.23.5/pkg/util/wait/wait.go:545\nsigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/source/source.go:131"}
I0403 14:04:36.876213       1 request.go:665] Waited for 1.195075657s due to client-side throttling, not priority and fairness, request: GET:https://10.16.192.1:443/apis/snapshot.storage.k8s.io/v1beta1?timeout=32s
{"level":"error","ts":1648994676.9788554,"logger":"controller-runtime.source","msg":"if kind is a CRD, it should be installed before calling Start","kind":"CronJob.batch","error":"no matches for kind \"CronJob\" in version \"batch/v1\"","stacktrace":"sigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1.1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/source/source.go:137\nk8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.23.5/pkg/util/wait/wait.go:233\nk8s.io/apimachinery/pkg/util/wait.WaitForWithContext\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.23.5/pkg/util/wait/wait.go:660\nk8s.io/apimachinery/pkg/util/wait.poll\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.23.5/pkg/util/wait/wait.go:594\nk8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext\n\t/home/runner/go/pkg/mod/k8s.io/apimachinery@v0.23.5/pkg/util/wait/wait.go:545\nsigs.k8s.io/controller-runtime/pkg/source.(*Kind).Start.func1\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.11.1/pkg/source/source.go:131"}
^C

What did you expect to happen:
I expected that the Starboard-Operator Pod just continues to run without crashing multiple times.
With the prior version we had no problems for the application to run and trigger scans.

Anything else you would like to add:

None

Environment:

Starboard version (use starboard version):
pinned via Helm v0.15.0
Kubernetes version (use kubectl version):

Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.5", GitCommit:"c285e781331a3785a7f436042c65c5641ce8a9e9", GitTreeState:"clean", BuildDate:"2022-03-16T15:51:05Z", GoVersion:"go1.17.8", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"20+", GitVersion:"v1.20.15-gke.1000", GitCommit:"d71f5620130949cf5f74de04e6ae8f3a96e4b718", GitTreeState:"clean", BuildDate:"2022-02-02T09:21:18Z", GoVersion:"go1.15.15b5", Compiler:"gc", Platform:"linux/amd64"}
WARNING: version difference between client (1.23) and server (1.20) exceeds the supported minor version skew of +/-1

GKE Node Version: 1.20.15-gke.1000

OS (macOS 10.15, Windows 10, Ubuntu 19.10 etc):
MacOS Monterey 12.3

Daniel Pacak · Answer 1 · Mon Apr 04 2022 16:22:43 GMT+0800 (China Standard Time)

Thank you @dirsigler for testing out the latest release! From the logs I see two problems:

Related to the NSA report that we introduced in v0.15.0 and for some reason its .Status grows to the size that exceed the etcd request limit /cc @chen-keinan
#1102

I've updated the title of this issue to focus it on the NSA report.

Daniel Pacak · Answer 2 · Mon Apr 04 2022 22:21:34 GMT+0800 (China Standard Time)

We've released new version of Starboard that solves #1102 in v0.15.1 along with Helm chart v0.10.1. Please try it out and let us know if the problem persists with CronJob API version. You can temporarily disable cluster compliance reconciler unit #1106 is resolved and released.

helm upgrade -i starboard-operator aquasecurity/starboard-operator \
  --namespace starboard-system \
  --create-namespace \
  --set="targetNamespaces=" \
  --set="clusterComplianceEnabled=false" \
  --set="trivy.ignoreUnfixed=true" \
  --set="trivy.serverURL=http://my-trivy-server.trivy-server.svc.cluster.local:4954" \
--set="trivy.mode=ClientServer" \
--set="operator.vulnerabilityScannerScanOnlyCurrentRevisions=true" \
--version 0.10.1

I'm closing this issue now and we'll be tracking the progress on the request is too large error in #1106