Audit fails on k8s cluster with many ressources

Question

Audit fails on k8s cluster with many ressources

luanaBanana opened this issue 2 years ago · comments

What happened?

On our by far biggest k8s cluster it is not possible to create an audit. Consequently the UI is not available.

This is the command output:

/opt/app $ polaris audit
I1020 08:36:00.323775     259 request.go:601] Waited for 1.044419015s due to client-side throttling, not priority and fairness, request: GET:https://XY:443/apis/gateway.networking.k8s.io/v1alpha2
Killed

There are the logs:

polaris-dashboard-d6d599589-59h7p dashboard time="2022-10-18T15:47:24Z" level=info msg="Starting Polaris dashboard server on port 8080"
polaris-dashboard-d6d599589-7st4g dashboard time="2022-10-18T15:47:44Z" level=info msg="Starting Polaris dashboard server on port 8080"
polaris-dashboard-d6d599589-7st4g dashboard time="2022-10-20T06:36:00Z" level=warning msg="Error retrieving parent object API v1 and Kind poddisruptionbudgets because of error: client rate limiter Wait returned an error: context canceled"
polaris-dashboard-d6d599589-7st4g dashboard time="2022-10-20T06:36:00Z" level=error msg="Error fetching Kubernetes resources client rate limiter Wait returned an error: context canceled"

What did you expect to happen?

Print audit output similar to this one:

{
  "PolarisOutputVersion": "1.0",
  "AuditTime": "2022-10-20T08:30:30Z",
  "SourceType": "ClusterNamespace",
[...]

How can we reproduce this?

Not so sure. Try `polaris audit' on a cluster of similar dimensions.

Worker Nodes: 75
Namespaces: 415

Version

5.6.0 fairwinds-stable/polaris

Search

I did search for other open and closed issues before opening this.

Code of Conduct

I agree to follow this project's Code of Conduct

Additional context

Thank you

Andy Suderman · Answer 1 · Fri Nov 18 2022 02:55:02 GMT+0800 (China Standard Time)

Hi there! It looks like you're running this inside of the cluster, and the pod got OOM Killed. Have you tried increasing the memory limits for that pod?

Luana Banana · Answer 2 · Fri Nov 18 2022 18:08:05 GMT+0800 (China Standard Time)

Thanks for the hint! I noticed some CPU throttling as well and increased both memory and CPU. Additionally, I had to increase the timeout on the nginx ingress. It works now!

🤝

satwika007 · Answer 3 · Wed May 31 2023 15:06:58 GMT+0800 (China Standard Time)

Hi @luanaBanana , We are facing similar issue with large k8's cluster. Could you tell us the cpu,memory limits you have configured to fix this issue?