carvel-dev / kapp

kapp is a simple deployment tool focused on the concept of "Kubernetes application" — a set of resources with the same label

Home Page:https://carvel.dev/kapp

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

kapp and kube API Server calls limits

revolunet opened this issue · comments

Hello,

I'm benchmarking some kapp deploy commands on a big manifest file with 6 containers and some wait-rules, without kapp-controller, and i'm facing 403 errors from the APIServer if i do multiple concurrent kapp deploy. Looks like these 403 make kapp stop with :

kapp: Error: waiting on reconcile job/job-template-kapp1-1-32ylei-db-hasura-create-secret-672rpn (batch/v1) namespace: fabrique-ci:
  Errored:
    Listing schema.GroupVersionResource{Group:"", Version:"v1", Resource:"pods"}, namespaced: true:
        Fetching all namespaces: an error on the server ("error trying to reach service: dial tcp 10.0.0.1:443: connect: connection refused") has prevented the request from succeeding (get namespaces)

I've done various tests and set kapp-api-qps to 10 and kapp-api-burst to 10 and have no more ideas so i'd like to share this with you, maybe you'll have some 😉

Looks like most of 403 are related to cluster-wide API calls (namespaces, pods...)

Have anyone experiences this kind of behaviour ? we're using AKS with Rancher.

Some numbers for a multiple deploy (3) with the below manifests (stripped) :

In this graph you can see APIServer responses to kapp :

  • green : 200 or 201
  • blue: 404
  • red: >201 and !=404 (mostly 403)

Capture d’écran 2022-10-19 à 01 47 08

Samples errors :

Capture d’écran 2022-10-19 à 01 49 22

Sample manifests :

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: app
    application: template
  name: app
  namespace: template-kapp1
  annotations:
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.app: kontinuous/app.template-kapp1
    kapp.k14s.io/change-rule.build-app: upsert after upserting kontinuous/build-app.template-kapp1
    kapp.k14s.io/change-rule.keycloakx: upsert after upserting kontinuous/keycloakx.template-kapp1
    kapp.k14s.io/change-rule.hasura: upsert after upserting kontinuous/hasura.template-kapp1
spec:
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: hasura
    application: template
  name: hasura
  namespace: template-kapp1
  annotations:
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.hasura: kontinuous/hasura.template-kapp1
    kapp.k14s.io/change-rule.build-hasura: upsert after upserting kontinuous/build-hasura.template-kapp1
    kapp.k14s.io/change-rule.db-hasura: upsert after upserting kontinuous/db-hasura.template-kapp1
    kapp.k14s.io/change-rule.keycloakx: upsert after upserting kontinuous/keycloakx.template-kapp1
spec:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: maildev
    application: template
  name: maildev
  namespace: template-kapp1
  annotations:
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.maildev: kontinuous/maildev.template-kapp1
spec:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: metabase
    application: template
  name: metabase
  namespace: template-kapp1
  annotations:
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.metabase: kontinuous/metabase.template-kapp1
    kapp.k14s.io/change-rule.db-metabase: upsert after upserting kontinuous/db-metabase.template-kapp1
spec:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    component: pgweb
    application: template
  name: pgweb
  namespace: template-kapp1
  annotations:
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.pgweb: kontinuous/pgweb.template-kapp1
    kapp.k14s.io/change-rule.db-hasura: upsert after upserting kontinuous/db-hasura.template-kapp1
spec:
   
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: keycloakx
  annotations:
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.keycloakx: kontinuous/keycloakx.template-kapp1
    kapp.k14s.io/change-rule.db-keycloak: upsert after upserting kontinuous/db-keycloak.template-kapp1
  namespace: template-kapp1
spec:
   
---
apiVersion: batch/v1
kind: Job
metadata:
  name: job-template-kapp1-build-app-kaniko-3zekn9
  namespace: fabrique-ci
  annotations:
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.build-app: kontinuous/build-app.template-kapp1
    kapp.k14s.io/change-group.build-app.kaniko: kontinuous/build-app.kaniko.template-kapp1
    kapp.k14s.io/change-group.build-app..kaniko: kontinuous/build-app..kaniko.template-kapp1
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
 
---
apiVersion: batch/v1
kind: Job
metadata:
  name: job-template-kapp1-build-hasura-kaniko-3d6853
  namespace: fabrique-ci
  annotations:
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.build-hasura: kontinuous/build-hasura.template-kapp1
    kapp.k14s.io/change-group.build-hasura.kaniko: kontinuous/build-hasura.kaniko.template-kapp1
    kapp.k14s.io/change-group.build-hasura..kaniko: kontinuous/build-hasura..kaniko.template-kapp1
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:

---
apiVersion: batch/v1
kind: Job
metadata:
  name: job-template-kapp1-db-hasura-create-db-1dtpbq
  namespace: fabrique-ci
  annotations:
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.db-hasura: kontinuous/db-hasura.template-kapp1
    kapp.k14s.io/change-group.db-hasura.create-db: kontinuous/db-hasura.create-db.template-kapp1
    kapp.k14s.io/change-group.db-hasura..create-db: kontinuous/db-hasura..create-db.template-kapp1
    kapp.k14s.io/change-rule.db-hasura..create-secret: upsert after upserting kontinuous/db-hasura..create-secret.template-kapp1
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:

---
apiVersion: batch/v1
kind: Job
metadata:
  name: job-template-kapp1-db-hasura-create-secret-672rpn
  namespace: fabrique-ci
  annotations:
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.db-hasura: kontinuous/db-hasura.template-kapp1
    kapp.k14s.io/change-group.db-hasura.create-secret: kontinuous/db-hasura.create-secret.template-kapp1
    kapp.k14s.io/change-group.db-hasura..create-secret: kontinuous/db-hasura..create-secret.template-kapp1
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1

spec:
   
---
apiVersion: batch/v1
kind: Job
metadata:
  name: job-template-kapp1-db-keycloak-create-db-3dxq1g
  namespace: fabrique-ci
  annotations:
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.db-keycloak: kontinuous/db-keycloak.template-kapp1
    kapp.k14s.io/change-group.db-keycloak.create-db: kontinuous/db-keycloak.create-db.template-kapp1
    kapp.k14s.io/change-group.db-keycloak..create-db: kontinuous/db-keycloak..create-db.template-kapp1
    kapp.k14s.io/change-rule.db-keycloak..create-secret: >-
      upsert after upserting
      kontinuous/db-keycloak..create-secret.template-kapp1
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1

spec:
   
---
apiVersion: batch/v1
kind: Job
metadata:
  name: job-template-kapp1-db-keycloak-create-secret-39r2rj
  namespace: fabrique-ci
  annotations:
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.db-keycloak: kontinuous/db-keycloak.template-kapp1
    kapp.k14s.io/change-group.db-keycloak.create-secret: kontinuous/db-keycloak.create-secret.template-kapp1
    kapp.k14s.io/change-group.db-keycloak..create-secret: kontinuous/db-keycloak..create-secret.template-kapp1
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
 
---
apiVersion: batch/v1
kind: Job
metadata:
  name: job-template-kapp1-db-metabase-create-db-xgit30
  namespace: fabrique-ci
  annotations:
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.db-metabase: kontinuous/db-metabase.template-kapp1
    kapp.k14s.io/change-group.db-metabase.create-db: kontinuous/db-metabase.create-db.template-kapp1
    kapp.k14s.io/change-group.db-metabase..create-db: kontinuous/db-metabase..create-db.template-kapp1
    kapp.k14s.io/change-rule.db-metabase..create-secret: >-
      upsert after upserting
      kontinuous/db-metabase..create-secret.template-kapp1
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
  
---
apiVersion: batch/v1
kind: Job
metadata:
  name: job-template-kapp1-db-metabase-create-secret-2bu7x4
  namespace: fabrique-ci
  annotations:
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.db-metabase: kontinuous/db-metabase.template-kapp1
    kapp.k14s.io/change-group.db-metabase.create-secret: kontinuous/db-metabase.create-secret.template-kapp1
    kapp.k14s.io/change-group.db-metabase..create-secret: kontinuous/db-metabase..create-secret.template-kapp1
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
  
spec:
 
---
apiVersion: batch/v1
kind: Job
metadata:
  name: job-template-kapp1-seed-hasura-import-secret-3h5a4u
  namespace: fabrique-ci
  annotations:
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.seed-hasura: kontinuous/seed-hasura.template-kapp1
    kapp.k14s.io/change-group.seed-hasura.import-secret: kontinuous/seed-hasura.import-secret.template-kapp1
    kapp.k14s.io/change-group.seed-hasura..import-secret: kontinuous/seed-hasura..import-secret.template-kapp1
    kapp.k14s.io/change-rule.hasura: upsert after upserting kontinuous/hasura.template-kapp1
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
 
---
apiVersion: batch/v1
kind: Job
metadata:
  name: job-template-kapp1-seed-hasura-seed-db-59hfdf
  namespace: fabrique-ci
  annotations:
    kapp.k14s.io/nonce: ""
    kapp.k14s.io/update-strategy: fallback-on-replace
    kapp.k14s.io/change-group: kontinuous/template-kapp1
    kapp.k14s.io/change-group.seed-hasura: kontinuous/seed-hasura.template-kapp1
    kapp.k14s.io/change-group.seed-hasura.seed-db: kontinuous/seed-hasura.seed-db.template-kapp1
    kapp.k14s.io/change-group.seed-hasura..seed-db: kontinuous/seed-hasura..seed-db.template-kapp1
    kapp.k14s.io/change-rule.seed-hasura..import-secret: >-
      upsert after upserting
      kontinuous/seed-hasura..import-secret.template-kapp1
    kapp.k14s.io/change-rule.hasura: upsert after upserting kontinuous/hasura.template-kapp1
    kapp.k14s.io/disable-original: ""
    kapp.k14s.io/create-strategy: fallback-on-update
    kapp.k14s.io/change-group.jobs: kontinuous/jobs.template-kapp1
spec:
 

Hi @revolunet! I am guessing that the server has a bunch of pending requests and therefore it's refusing the tcp connection.
You can try to increase the --wait-check-interval duration to a high number (say 5 seconds or 20 seconds) and see if that helps. Increasing it to 5 seconds would decrease the number of api calls made during the waiting stage to ~ 1/5th.

thanks @praveenrewar ! will try and report with these options

While it reduces the load in the long run it doesnt prevent 403s.

Looks like at the start of kapp deploy, theres a lot of requests made and it doesnt account the qps/burst/wait options

Capture d’écran 2022-10-19 à 09 49 52

I see. Would you be able to share any such error? If it's happening in the apply stage, then you could also try increasing --apply-check-interval and decreasing --apply-concurrency (default is 5). But I feel that it's happening much before that. You can also try reducing --existing-non-labeled-resources-check-concurrency to a smaller number like 5 (default is 100).

mmm thanks. i've tried many combinations without luck. looks like we have something wrong in our cluster. it fails as soon as we launch multiple parrallel kapp deploys; investigating...

Would you be able to share a couple of things which might help us in improving kapp performance (we are already working on a couple of things #599)

  • cluster configuration (RAM, CPU, number of nodes etc,.)
  • permissions available to the user/SA being used
  • couple of errors where you are seeing the 403 error
  • Number of kapp apps you are trying to deploy concurrently

Hi

Our cluster is 6x(6cpu + 32Go)

I tried with a superior account and got no 403 but still 499 or 500 from the API Server which make kapp stop.
So yes, the 403 are due to a limited serviceaccount that cannot query cluster-wide

We use Rancher and suspect it is our bottleneck here. We'll test directly on the API Server to see if it gets better.

It works well with one kapp deploy, sometimes two simultaneous but no more with the above manifests.

some logs examples

2022-10-19T09:59:58+00:00 499 /k8s/clusters/c-gjtkk/api/v1/namespaces/template-kapp1-8-2/configmaps kapp/v0.0.0 (linux/amd64) kubernetes/$Format
2022-10-19T09:59:58+00:00 499 /k8s/clusters/c-gjtkk/api/v1/namespaces/template-kapp1-8-2/configmaps kapp/v0.0.0 (linux/amd64) kubernetes/$Format
2022-10-19T09:59:58+00:00 499 /k8s/clusters/c-gjtkk/api/v1/namespaces/template-kapp1-8-2/configmaps kapp/v0.0.0 (linux/amd64) kubernetes/$Format
2022-10-19T09:59:58+00:00 409 /k8s/clusters/c-gjtkk/api/v1/namespaces/template-kapp1-8-2/serviceaccounts kapp/v0.0.0 (linux/amd64) kubernetes/$Format
2022-10-19T09:59:50+00:00 499 /k8s/clusters/c-gjtkk/api/v1/namespaces/template-kapp1-8-1/configmaps kapp/v0.0.0 (linux/amd64) kubernetes/$Format
2022-10-19T09:59:50+00:00 409 /k8s/clusters/c-gjtkk/api/v1/namespaces/template-kapp1-8-1/serviceaccounts kapp/v0.0.0 (linux/amd64) kubernetes/$Format

I tried with a superior account and got no 403 but still 499 or 500 from the API Server which make kapp stop.
So yes, the 403 are due to a limited serviceaccount that cannot query cluster-wide

I see, but usually if you get a forbidden error leads kapp to stop, I am wondering what caused these mani api calls then?

We use Rancher and suspect it is our bottleneck here. We'll test directly on the API Server to see if it gets better.

That is a possibility, because based on the cluster configuration, it should be able to handle these many requests.

Hi @revolunet ! Were you able to find the root cause of the failures? Let me know if you need any help or if you would like to share some information which could be helpful to improve kapp performance.

Hello @praveenrewar,

After many tests we've confirmed that it comes from the rancher API; For some reason it throws "connection refused" when under load and we're unable to find the root cause or more logs.

The good news is kapp works flawlessly when talking directly to the kube API server !

I think this issue can be closed

Thank you for the update @revolunet. Closing the issue for now, but feel free to re open it if you find something we can improve on.

Maybe kapp could have a better retry mechanism on API errors so it could also work with flaky clusters.

Thanks for your support !

We do have a set of retry-able errors, but currently retrying doesn't happen in the waiting stage. We are tracking that over here. Hopefully we will find a suitable solution to it soon.