kapp and kube API Server calls limits
revolunet opened this issue · comments
Hello,
I'm benchmarking some kapp deploy
commands on a big manifest file with 6 containers and some wait-rules, without kapp-controller, and i'm facing 403 errors from the APIServer if i do multiple concurrent kapp deploy
. Looks like these 403 make kapp stop with :
kapp: Error: waiting on reconcile job/job-template-kapp1-1-32ylei-db-hasura-create-secret-672rpn (batch/v1) namespace: fabrique-ci:
Errored:
Listing schema.GroupVersionResource{Group:"", Version:"v1", Resource:"pods"}, namespaced: true:
Fetching all namespaces: an error on the server ("error trying to reach service: dial tcp 10.0.0.1:443: connect: connection refused") has prevented the request from succeeding (get namespaces)
I've done various tests and set kapp-api-qps
to 10 and kapp-api-burst
to 10 and have no more ideas so i'd like to share this with you, maybe you'll have some 😉
Looks like most of 403 are related to cluster-wide API calls (namespaces, pods...)
Have anyone experiences this kind of behaviour ? we're using AKS with Rancher.
Some numbers for a multiple deploy (3) with the below manifests (stripped) :
In this graph you can see APIServer responses to kapp
:
- green : 200 or 201
- blue: 404
- red: >201 and !=404 (mostly 403)
Samples errors :
Sample manifests :
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
component: app
application: template
name: app
namespace: template-kapp1
annotations:
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/nonce: ""
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.app: kontinuous/app.template-kapp1
kapp.k14s.io/change-rule.build-app: upsert after upserting kontinuous/build-app.template-kapp1
kapp.k14s.io/change-rule.keycloakx: upsert after upserting kontinuous/keycloakx.template-kapp1
kapp.k14s.io/change-rule.hasura: upsert after upserting kontinuous/hasura.template-kapp1
spec:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
component: hasura
application: template
name: hasura
namespace: template-kapp1
annotations:
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/nonce: ""
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.hasura: kontinuous/hasura.template-kapp1
kapp.k14s.io/change-rule.build-hasura: upsert after upserting kontinuous/build-hasura.template-kapp1
kapp.k14s.io/change-rule.db-hasura: upsert after upserting kontinuous/db-hasura.template-kapp1
kapp.k14s.io/change-rule.keycloakx: upsert after upserting kontinuous/keycloakx.template-kapp1
spec:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
component: maildev
application: template
name: maildev
namespace: template-kapp1
annotations:
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/nonce: ""
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.maildev: kontinuous/maildev.template-kapp1
spec:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
component: metabase
application: template
name: metabase
namespace: template-kapp1
annotations:
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/nonce: ""
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.metabase: kontinuous/metabase.template-kapp1
kapp.k14s.io/change-rule.db-metabase: upsert after upserting kontinuous/db-metabase.template-kapp1
spec:
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
component: pgweb
application: template
name: pgweb
namespace: template-kapp1
annotations:
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/nonce: ""
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.pgweb: kontinuous/pgweb.template-kapp1
kapp.k14s.io/change-rule.db-hasura: upsert after upserting kontinuous/db-hasura.template-kapp1
spec:
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: keycloakx
annotations:
kapp.k14s.io/disable-original: ""
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.keycloakx: kontinuous/keycloakx.template-kapp1
kapp.k14s.io/change-rule.db-keycloak: upsert after upserting kontinuous/db-keycloak.template-kapp1
namespace: template-kapp1
spec:
---
apiVersion: batch/v1
kind: Job
metadata:
name: job-template-kapp1-build-app-kaniko-3zekn9
namespace: fabrique-ci
annotations:
kapp.k14s.io/nonce: ""
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.build-app: kontinuous/build-app.template-kapp1
kapp.k14s.io/change-group.build-app.kaniko: kontinuous/build-app.kaniko.template-kapp1
kapp.k14s.io/change-group.build-app..kaniko: kontinuous/build-app..kaniko.template-kapp1
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
---
apiVersion: batch/v1
kind: Job
metadata:
name: job-template-kapp1-build-hasura-kaniko-3d6853
namespace: fabrique-ci
annotations:
kapp.k14s.io/nonce: ""
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.build-hasura: kontinuous/build-hasura.template-kapp1
kapp.k14s.io/change-group.build-hasura.kaniko: kontinuous/build-hasura.kaniko.template-kapp1
kapp.k14s.io/change-group.build-hasura..kaniko: kontinuous/build-hasura..kaniko.template-kapp1
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
---
apiVersion: batch/v1
kind: Job
metadata:
name: job-template-kapp1-db-hasura-create-db-1dtpbq
namespace: fabrique-ci
annotations:
kapp.k14s.io/nonce: ""
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.db-hasura: kontinuous/db-hasura.template-kapp1
kapp.k14s.io/change-group.db-hasura.create-db: kontinuous/db-hasura.create-db.template-kapp1
kapp.k14s.io/change-group.db-hasura..create-db: kontinuous/db-hasura..create-db.template-kapp1
kapp.k14s.io/change-rule.db-hasura..create-secret: upsert after upserting kontinuous/db-hasura..create-secret.template-kapp1
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
---
apiVersion: batch/v1
kind: Job
metadata:
name: job-template-kapp1-db-hasura-create-secret-672rpn
namespace: fabrique-ci
annotations:
kapp.k14s.io/nonce: ""
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.db-hasura: kontinuous/db-hasura.template-kapp1
kapp.k14s.io/change-group.db-hasura.create-secret: kontinuous/db-hasura.create-secret.template-kapp1
kapp.k14s.io/change-group.db-hasura..create-secret: kontinuous/db-hasura..create-secret.template-kapp1
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
---
apiVersion: batch/v1
kind: Job
metadata:
name: job-template-kapp1-db-keycloak-create-db-3dxq1g
namespace: fabrique-ci
annotations:
kapp.k14s.io/nonce: ""
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.db-keycloak: kontinuous/db-keycloak.template-kapp1
kapp.k14s.io/change-group.db-keycloak.create-db: kontinuous/db-keycloak.create-db.template-kapp1
kapp.k14s.io/change-group.db-keycloak..create-db: kontinuous/db-keycloak..create-db.template-kapp1
kapp.k14s.io/change-rule.db-keycloak..create-secret: >-
upsert after upserting
kontinuous/db-keycloak..create-secret.template-kapp1
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
---
apiVersion: batch/v1
kind: Job
metadata:
name: job-template-kapp1-db-keycloak-create-secret-39r2rj
namespace: fabrique-ci
annotations:
kapp.k14s.io/nonce: ""
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.db-keycloak: kontinuous/db-keycloak.template-kapp1
kapp.k14s.io/change-group.db-keycloak.create-secret: kontinuous/db-keycloak.create-secret.template-kapp1
kapp.k14s.io/change-group.db-keycloak..create-secret: kontinuous/db-keycloak..create-secret.template-kapp1
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
---
apiVersion: batch/v1
kind: Job
metadata:
name: job-template-kapp1-db-metabase-create-db-xgit30
namespace: fabrique-ci
annotations:
kapp.k14s.io/nonce: ""
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.db-metabase: kontinuous/db-metabase.template-kapp1
kapp.k14s.io/change-group.db-metabase.create-db: kontinuous/db-metabase.create-db.template-kapp1
kapp.k14s.io/change-group.db-metabase..create-db: kontinuous/db-metabase..create-db.template-kapp1
kapp.k14s.io/change-rule.db-metabase..create-secret: >-
upsert after upserting
kontinuous/db-metabase..create-secret.template-kapp1
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
---
apiVersion: batch/v1
kind: Job
metadata:
name: job-template-kapp1-db-metabase-create-secret-2bu7x4
namespace: fabrique-ci
annotations:
kapp.k14s.io/nonce: ""
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.db-metabase: kontinuous/db-metabase.template-kapp1
kapp.k14s.io/change-group.db-metabase.create-secret: kontinuous/db-metabase.create-secret.template-kapp1
kapp.k14s.io/change-group.db-metabase..create-secret: kontinuous/db-metabase..create-secret.template-kapp1
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
---
apiVersion: batch/v1
kind: Job
metadata:
name: job-template-kapp1-seed-hasura-import-secret-3h5a4u
namespace: fabrique-ci
annotations:
kapp.k14s.io/nonce: ""
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.seed-hasura: kontinuous/seed-hasura.template-kapp1
kapp.k14s.io/change-group.seed-hasura.import-secret: kontinuous/seed-hasura.import-secret.template-kapp1
kapp.k14s.io/change-group.seed-hasura..import-secret: kontinuous/seed-hasura..import-secret.template-kapp1
kapp.k14s.io/change-rule.hasura: upsert after upserting kontinuous/hasura.template-kapp1
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
---
apiVersion: batch/v1
kind: Job
metadata:
name: job-template-kapp1-seed-hasura-seed-db-59hfdf
namespace: fabrique-ci
annotations:
kapp.k14s.io/nonce: ""
kapp.k14s.io/update-strategy: fallback-on-replace
kapp.k14s.io/change-group: kontinuous/template-kapp1
kapp.k14s.io/change-group.seed-hasura: kontinuous/seed-hasura.template-kapp1
kapp.k14s.io/change-group.seed-hasura.seed-db: kontinuous/seed-hasura.seed-db.template-kapp1
kapp.k14s.io/change-group.seed-hasura..seed-db: kontinuous/seed-hasura..seed-db.template-kapp1
kapp.k14s.io/change-rule.seed-hasura..import-secret: >-
upsert after upserting
kontinuous/seed-hasura..import-secret.template-kapp1
kapp.k14s.io/change-rule.hasura: upsert after upserting kontinuous/hasura.template-kapp1
kapp.k14s.io/disable-original: ""
kapp.k14s.io/create-strategy: fallback-on-update
kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
Hi @revolunet! I am guessing that the server has a bunch of pending requests and therefore it's refusing the tcp connection.
You can try to increase the --wait-check-interval
duration to a high number (say 5 seconds or 20 seconds) and see if that helps. Increasing it to 5 seconds would decrease the number of api calls made during the waiting stage to ~ 1/5th.
thanks @praveenrewar ! will try and report with these options
I see. Would you be able to share any such error? If it's happening in the apply stage, then you could also try increasing --apply-check-interval
and decreasing --apply-concurrency
(default is 5). But I feel that it's happening much before that. You can also try reducing --existing-non-labeled-resources-check-concurrency
to a smaller number like 5 (default is 100).
mmm thanks. i've tried many combinations without luck. looks like we have something wrong in our cluster. it fails as soon as we launch multiple parrallel kapp deploys; investigating...
Would you be able to share a couple of things which might help us in improving kapp performance (we are already working on a couple of things #599)
- cluster configuration (RAM, CPU, number of nodes etc,.)
- permissions available to the user/SA being used
- couple of errors where you are seeing the 403 error
- Number of kapp apps you are trying to deploy concurrently
Hi
Our cluster is 6x(6cpu + 32Go)
I tried with a superior account and got no 403 but still 499 or 500 from the API Server which make kapp stop.
So yes, the 403 are due to a limited serviceaccount that cannot query cluster-wide
We use Rancher and suspect it is our bottleneck here. We'll test directly on the API Server to see if it gets better.
It works well with one kapp deploy
, sometimes two simultaneous but no more with the above manifests.
some logs examples
2022-10-19T09:59:58+00:00 499 /k8s/clusters/c-gjtkk/api/v1/namespaces/template-kapp1-8-2/configmaps kapp/v0.0.0 (linux/amd64) kubernetes/$Format
2022-10-19T09:59:58+00:00 499 /k8s/clusters/c-gjtkk/api/v1/namespaces/template-kapp1-8-2/configmaps kapp/v0.0.0 (linux/amd64) kubernetes/$Format
2022-10-19T09:59:58+00:00 499 /k8s/clusters/c-gjtkk/api/v1/namespaces/template-kapp1-8-2/configmaps kapp/v0.0.0 (linux/amd64) kubernetes/$Format
2022-10-19T09:59:58+00:00 409 /k8s/clusters/c-gjtkk/api/v1/namespaces/template-kapp1-8-2/serviceaccounts kapp/v0.0.0 (linux/amd64) kubernetes/$Format
2022-10-19T09:59:50+00:00 499 /k8s/clusters/c-gjtkk/api/v1/namespaces/template-kapp1-8-1/configmaps kapp/v0.0.0 (linux/amd64) kubernetes/$Format
2022-10-19T09:59:50+00:00 409 /k8s/clusters/c-gjtkk/api/v1/namespaces/template-kapp1-8-1/serviceaccounts kapp/v0.0.0 (linux/amd64) kubernetes/$Format
I tried with a superior account and got no 403 but still 499 or 500 from the API Server which make kapp stop.
So yes, the 403 are due to a limited serviceaccount that cannot query cluster-wide
I see, but usually if you get a forbidden error leads kapp to stop, I am wondering what caused these mani api calls then?
We use Rancher and suspect it is our bottleneck here. We'll test directly on the API Server to see if it gets better.
That is a possibility, because based on the cluster configuration, it should be able to handle these many requests.
Hi @revolunet ! Were you able to find the root cause of the failures? Let me know if you need any help or if you would like to share some information which could be helpful to improve kapp performance.
Hello @praveenrewar,
After many tests we've confirmed that it comes from the rancher API; For some reason it throws "connection refused" when under load and we're unable to find the root cause or more logs.
The good news is kapp works flawlessly when talking directly to the kube API server !
I think this issue can be closed
Thank you for the update @revolunet. Closing the issue for now, but feel free to re open it if you find something we can improve on.
Maybe kapp could have a better retry mechanism on API errors so it could also work with flaky clusters.
Thanks for your support !
We do have a set of retry-able errors, but currently retrying doesn't happen in the waiting stage. We are tracking that over here. Hopefully we will find a suitable solution to it soon.