kedacore / charts

Helm charts for KEDA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

APIservice configuring incorrectly

dmcstravick7 opened this issue · comments

Issue

  • metrics-server APIservice is failing to come up (ServiceNotFound) v1beta1.external.metrics.k8s.io
  • All Keda resources are deployed as keda-operator* in this case that includes keda-operator-metrics-apiserver.
    • The spec for /keda/config/metrics-server/api_service.yaml contains spec.Service.Name for keda-metrics-apiserver. ^ Issue

Fix

  • Editing the APIservice and updating spec.Service.Name to keda-operator-metrics-apiserver.
  • E.g
    • kubectl edit apiservice v1beta1.external.metrics.k8s.io
    • Change spec.Service.Name from keda-metrics-apiserver --> keda-operator-metrics-apiserver.

Steps to reproduce

  • Install Keda via helm chart 2.11.1 (note this has happened with previous versions)
  • Running any kubectl command such as kubectl get pods returns the below error (also returns pods)
57503 memcache.go:287] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request

Notes

  • Could this be due to setting me not setting some value in the chart that I am unaware of?

Originally posted by @dmcstravick7 in kedacore/keda#4769

Is this a Helm issue or KEDA deployment itself? Always happy to review PRs.

I think it's a Keda issue as the Kubernetes manifest specifies the name wrong, which is then picked up by Helm in my case.

I'll create a PR, I just want to be very sure that this is an issue for someone other then me.

Yes, but the main question is do we need to update the Helm chart and/or KEDA core which generates the manifests. @zroubalik can you take a look? I keep forgetting where we annotate this.

I am having this same issue for myself too

Hello,
I have just tried with a fresh cluster using latest helm chart (v2.11.1) and default values and I can't reproduce the issue. In my case, the apiservice points to the correct service.

Could you share your values to test them?

@tomkerkhove , this only can happen with helm because otherwise, e2e test wouldn't pass as the metrics server hadn't been reachable, so I move this issue to charts repo

Hi @JorTurFer,

I use Terraform to deploy a helm_release, but it uses all default values, besides the ones below.
Something unique about my environment is Keda 2.7.1 is deployed (not via Helm) and I'm deleting all the resources (only leaving CRDs) and deploying the helm_release of Keda 2.11.1. But the deployment works fine and the only issue is the one mentioned in post.

operator:
  replicaCount: 2
prometheus:
  metricServer:
    podMonitor:
      namespace: kube-system
  operator:
    podMonitor:
      namespace: kube-system
  prometheusRules:
    namespace: kube-system
serviceAccount:
  annotations:
    XYZ_ROLE
webhooks:
  enabled: false

recapping:
You have KEDA v2.7.1 installed and you want to migrate to v2.11 but it doesn't work, right? You are using default values + the values above. Is it correct?
After the confirmation, I'll try your scenario again

That's correct, with the context of also switching from vanilla manifests, to now using Helm via Terraform.
I can also provide the kubectl commands I am using to delete the Keda resources (everything except ScaledObjects)

I can also provide the kubectl commands I am using to delete the Keda resources (everything except ScaledObjects)

It'd be nice ❤️ I'll try to reproduce your scenario exactly

The below is what I'm running to delete all the required resources, then tag and annotate CRDs so that the helm deploy can take them over.

Sidenote, the only reason i'm doing it this way is I will have several ScaledObjects running in production and I don't want to effect/have to re-deploy those (which I think happens if I just run k delete -f keda-manifest) Is this accurate?

k delete ClusterRole keda-operator
k delete ClusterRole keda-external-metrics-reader
k delete RoleBinding keda-auth-reader -n kube-system
k delete ClusterRoleBinding keda-hpa-controller-external-metrics 
k delete ClusterRoleBinding keda-operator
k delete ClusterRoleBinding keda-system-auth-delegator
k delete Service keda-metrics-apiserver -n kube-system 
k delete Deployment/keda-metrics-apiserver -n kube-system 
k delete Deployment/keda-operator -n kube-system

k label --overwrite crd clustertriggerauthentications.keda.sh app.kubernetes.io/managed-by="Helm"
k annotate --overwrite crd clustertriggerauthentications.keda.sh meta.helm.sh/release-name="keda"
k annotate --overwrite crd clustertriggerauthentications.keda.sh meta.helm.sh/release-namespace="kube-system"

k label --overwrite crd scaledjobs.keda.sh app.kubernetes.io/managed-by="Helm"
k annotate --overwrite crd scaledjobs.keda.sh meta.helm.sh/release-name="keda"
k annotate --overwrite crd scaledjobs.keda.sh meta.helm.sh/release-namespace="kube-system"

k label --overwrite crd scaledobjects.keda.sh app.kubernetes.io/managed-by="Helm"
k annotate --overwrite crd scaledobjects.keda.sh meta.helm.sh/release-name="keda"
k annotate --overwrite crd scaledobjects.keda.sh meta.helm.sh/release-namespace="kube-system"

k label --overwrite crd triggerauthentications.keda.sh app.kubernetes.io/managed-by="Helm"
k annotate --overwrite crd triggerauthentications.keda.sh meta.helm.sh/release-name="keda"
k annotate --overwrite crd triggerauthentications.keda.sh meta.helm.sh/release-namespace="kube-system"

k label --overwrite APIService v1beta1.external.metrics.k8s.io app.kubernetes.io/managed-by="Helm"
k annotate --overwrite APIService v1beta1.external.metrics.k8s.io meta.helm.sh/release-name="keda"
k annotate --overwrite APIService v1beta1.external.metrics.k8s.io meta.helm.sh/release-namespace="kube-system"

I'm going to test it soon (today or tomorrow max), but I have a question in the meantime, why didn't you just upgrade the chart using helm upgrade? It doesn't require any deletion and works in one step. I mean, if you have installed KEDA (or any other component) using helm, you can upgrade it just with helm upgrade

That would be ideal. But currently we're also moving management of helm to Terraform, using the helm_release provider.

That would be ideal. But currently we're also moving management of helm to Terraform, using the helm_release provider.

AFAIK, that provider supports upgrading out-of-the-box, so you just need to change the version field and it will execute helm upgrade

Hmm, that might work. I tried that previously but it wouldn't work as it was missing all the labels and annotations, maybe I can just adjust my process, I won't delete any resources but i'll do all labelling/annotating and attempt to using Terraform to deploy.

I'll add a comment once I try that. Thanks for your help so far @JorTurFer!

experienced similar issue with version 2.11.1, prometheus was not able to scrape metrics from the metrics server.
when trying to check the application locally using

kubectl port-forward $(kubectl get pods -l app=keda-operator-metrics-apiserver -n keda -o name) 8080:8080 -n keda

the connection crash.
when testing the same chart version, but setting this value:

image:
  metricsApiServer:
    tag: "2.10.1"

all seems to be working properly

hi @ArieLevs ,
It's a known bug, we removed the prometheus server by error and we have already merged the fix, a hotfix release will be cut soon. I'd suggest paying attention to this comment about the metrics

In any case, I think that the problems here are different because the APIService uses the port 6443