APIservice configuring incorrectly

Question

APIservice configuring incorrectly

dmcstravick7 opened this issue a year ago · comments

dmcstravick7 commented a year ago

Issue

metrics-server APIservice is failing to come up (ServiceNotFound) v1beta1.external.metrics.k8s.io
All Keda resources are deployed as keda-operator* in this case that includes keda-operator-metrics-apiserver.
- The spec for /keda/config/metrics-server/api_service.yaml contains spec.Service.Name for keda-metrics-apiserver. ^ Issue

Fix

Editing the APIservice and updating spec.Service.Name to keda-operator-metrics-apiserver.
E.g
- kubectl edit apiservice v1beta1.external.metrics.k8s.io
- Change spec.Service.Name from keda-metrics-apiserver --> keda-operator-metrics-apiserver.

Steps to reproduce

Install Keda via helm chart 2.11.1 (note this has happened with previous versions)
Running any kubectl command such as kubectl get pods returns the below error (also returns pods)

57503 memcache.go:287] couldn't get resource list for external.metrics.k8s.io/v1beta1: the server is currently unable to handle the request

Notes

Could this be due to setting me not setting some value in the chart that I am unaware of?

Originally posted by @dmcstravick7 in kedacore/keda#4769

Tom Kerkhove · Answer 1 · Wed Jul 05 2023 14:02:40 GMT+0800 (China Standard Time)

Is this a Helm issue or KEDA deployment itself? Always happy to review PRs.

dmcstravick7 · Answer 2 · Wed Jul 05 2023 14:11:01 GMT+0800 (China Standard Time)

I think it's a Keda issue as the Kubernetes manifest specifies the name wrong, which is then picked up by Helm in my case.

I'll create a PR, I just want to be very sure that this is an issue for someone other then me.

Tom Kerkhove · Answer 3 · Wed Jul 05 2023 14:49:31 GMT+0800 (China Standard Time)

Yes, but the main question is do we need to update the Helm chart and/or KEDA core which generates the manifests. @zroubalik can you take a look? I keep forgetting where we annotate this.

Apurva Jyoti Paul · Answer 4 · Thu Jul 06 2023 20:51:17 GMT+0800 (China Standard Time)

I am having this same issue for myself too

Jorge Turrado Ferrero · Answer 5 · Fri Jul 07 2023 03:52:44 GMT+0800 (China Standard Time)

Hello,
I have just tried with a fresh cluster using latest helm chart (v2.11.1) and default values and I can't reproduce the issue. In my case, the apiservice points to the correct service.

Could you share your values to test them?

@tomkerkhove , this only can happen with helm because otherwise, e2e test wouldn't pass as the metrics server hadn't been reachable, so I move this issue to charts repo

dmcstravick7 · Answer 6 · Mon Jul 10 2023 09:25:04 GMT+0800 (China Standard Time)

Hi @JorTurFer,

I use Terraform to deploy a helm_release, but it uses all default values, besides the ones below.
Something unique about my environment is Keda 2.7.1 is deployed (not via Helm) and I'm deleting all the resources (only leaving CRDs) and deploying the helm_release of Keda 2.11.1. But the deployment works fine and the only issue is the one mentioned in post.

operator:
  replicaCount: 2
prometheus:
  metricServer:
    podMonitor:
      namespace: kube-system
  operator:
    podMonitor:
      namespace: kube-system
  prometheusRules:
    namespace: kube-system
serviceAccount:
  annotations:
    XYZ_ROLE
webhooks:
  enabled: false

Jorge Turrado Ferrero · Answer 7 · Mon Jul 10 2023 14:56:14 GMT+0800 (China Standard Time)

recapping:
You have KEDA v2.7.1 installed and you want to migrate to v2.11 but it doesn't work, right? You are using default values + the values above. Is it correct?
After the confirmation, I'll try your scenario again

dmcstravick7 · Answer 8 · Mon Jul 10 2023 14:58:36 GMT+0800 (China Standard Time)

That's correct, with the context of also switching from vanilla manifests, to now using Helm via Terraform.
I can also provide the kubectl commands I am using to delete the Keda resources (everything except ScaledObjects)

Jorge Turrado Ferrero · Answer 9 · Mon Jul 10 2023 15:00:48 GMT+0800 (China Standard Time)

I can also provide the kubectl commands I am using to delete the Keda resources (everything except ScaledObjects)

It'd be nice ❤️ I'll try to reproduce your scenario exactly

dmcstravick7 · Answer 10 · Mon Jul 10 2023 15:06:22 GMT+0800 (China Standard Time)

The below is what I'm running to delete all the required resources, then tag and annotate CRDs so that the helm deploy can take them over.

Sidenote, the only reason i'm doing it this way is I will have several ScaledObjects running in production and I don't want to effect/have to re-deploy those (which I think happens if I just run k delete -f keda-manifest) Is this accurate?

k delete ClusterRole keda-operator
k delete ClusterRole keda-external-metrics-reader
k delete RoleBinding keda-auth-reader -n kube-system
k delete ClusterRoleBinding keda-hpa-controller-external-metrics 
k delete ClusterRoleBinding keda-operator
k delete ClusterRoleBinding keda-system-auth-delegator
k delete Service keda-metrics-apiserver -n kube-system 
k delete Deployment/keda-metrics-apiserver -n kube-system 
k delete Deployment/keda-operator -n kube-system

k label --overwrite crd clustertriggerauthentications.keda.sh app.kubernetes.io/managed-by="Helm"
k annotate --overwrite crd clustertriggerauthentications.keda.sh meta.helm.sh/release-name="keda"
k annotate --overwrite crd clustertriggerauthentications.keda.sh meta.helm.sh/release-namespace="kube-system"

k label --overwrite crd scaledjobs.keda.sh app.kubernetes.io/managed-by="Helm"
k annotate --overwrite crd scaledjobs.keda.sh meta.helm.sh/release-name="keda"
k annotate --overwrite crd scaledjobs.keda.sh meta.helm.sh/release-namespace="kube-system"

k label --overwrite crd scaledobjects.keda.sh app.kubernetes.io/managed-by="Helm"
k annotate --overwrite crd scaledobjects.keda.sh meta.helm.sh/release-name="keda"
k annotate --overwrite crd scaledobjects.keda.sh meta.helm.sh/release-namespace="kube-system"

k label --overwrite crd triggerauthentications.keda.sh app.kubernetes.io/managed-by="Helm"
k annotate --overwrite crd triggerauthentications.keda.sh meta.helm.sh/release-name="keda"
k annotate --overwrite crd triggerauthentications.keda.sh meta.helm.sh/release-namespace="kube-system"

k label --overwrite APIService v1beta1.external.metrics.k8s.io app.kubernetes.io/managed-by="Helm"
k annotate --overwrite APIService v1beta1.external.metrics.k8s.io meta.helm.sh/release-name="keda"
k annotate --overwrite APIService v1beta1.external.metrics.k8s.io meta.helm.sh/release-namespace="kube-system"

Jorge Turrado Ferrero · Answer 11 · Mon Jul 10 2023 23:47:11 GMT+0800 (China Standard Time)

I'm going to test it soon (today or tomorrow max), but I have a question in the meantime, why didn't you just upgrade the chart using helm upgrade? It doesn't require any deletion and works in one step. I mean, if you have installed KEDA (or any other component) using helm, you can upgrade it just with helm upgrade

dmcstravick7 · Answer 12 · Tue Jul 11 2023 05:12:52 GMT+0800 (China Standard Time)

That would be ideal. But currently we're also moving management of helm to Terraform, using the helm_release provider.

Jorge Turrado Ferrero · Answer 13 · Tue Jul 11 2023 17:18:30 GMT+0800 (China Standard Time)

That would be ideal. But currently we're also moving management of helm to Terraform, using the helm_release provider.

AFAIK, that provider supports upgrading out-of-the-box, so you just need to change the version field and it will execute helm upgrade

dmcstravick7 · Answer 14 · Wed Jul 12 2023 07:25:21 GMT+0800 (China Standard Time)

Hmm, that might work. I tried that previously but it wouldn't work as it was missing all the labels and annotations, maybe I can just adjust my process, I won't delete any resources but i'll do all labelling/annotating and attempt to using Terraform to deploy.

I'll add a comment once I try that. Thanks for your help so far @JorTurFer!

Arie Lev · Answer 15 · Thu Jul 13 2023 17:23:56 GMT+0800 (China Standard Time)

experienced similar issue with version 2.11.1, prometheus was not able to scrape metrics from the metrics server.
when trying to check the application locally using

kubectl port-forward $(kubectl get pods -l app=keda-operator-metrics-apiserver -n keda -o name) 8080:8080 -n keda

the connection crash.
when testing the same chart version, but setting this value:

image:
  metricsApiServer:
    tag: "2.10.1"

all seems to be working properly

Jorge Turrado Ferrero · Answer 16 · Thu Jul 13 2023 17:29:52 GMT+0800 (China Standard Time)

hi @ArieLevs ,
It's a known bug, we removed the prometheus server by error and we have already merged the fix, a hotfix release will be cut soon. I'd suggest paying attention to this comment about the metrics

In any case, I think that the problems here are different because the APIService uses the port 6443