kedacore / charts

Helm charts for KEDA

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Handle custom cluster-domain values doesn't work without certmanager

marandalucas opened this issue · comments

@lucchmielowski Hi! Thank you so much for this fix. #399

Unfortunately, It doesn't work for us.

  • We noticed that we should install "cert-manager".
  • metrics-service-address is hardcoded. So, we can't update it from values.yaml.
  • Certificates and secure connections are mandatory.

HELM CONFIG
clusterDomain: gcp-prod-pv-na1-a.company.cluster.local

ERROR:
W0314 15:03:14.706154 1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local:9666", ServerName: "keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local:9666", }. Err: connection error: desc = "transport: authentication handshake failed: tls: failed to verify certificate: x509: certificate is valid for keda-operator, keda-operator, keda-operator.keda, keda-operator.keda.svc, keda-operator.keda.svc.cluster.local, keda-admission-webhooks, keda-admission-webhooks.keda, keda-admission-webhooks.keda.svc, keda-admission-webhooks.keda.svc.cluster.local, keda-operator-metrics-apiserver, keda-operator-metrics-apiserver.keda, keda-operator-metrics-apiserver.keda.svc, keda-operator-metrics-apiserver.keda.svc.cluster.local, not keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local"

We wonder if you could fix it. We don't need cert-manager in our clusters.

Thanks in advance

Hello @marandalucas 👋

I just had a look at it and I've tried re-creating the issue and it seems to be working fine on my side.

Just so I understand:

  • You installed cert-manager
  • You installed the chart with a custom clusterDomain
  • Your metrics-server pods throws the error you shared ?

I'm wondering: what version of the chart are you using and are you using the certificates created by the chart ? (certificates.certManager.enabled: true)

Also, could you share your certificate keda-operator-tls-certificates content ?

Hello @lucchmielowski 👍

If you want to recreate the issue you have to:

  1. Create a GKE cluster without the cert-manager tool.
  2. Install KEDA (2.13.0) with a custom clusterDomain.
  3. Check the metric-apiserver pod.
ERROR:
W0314 15:03:14.706154       1 logging.go:59] [core] [Channel #1 SubChannel #2] grpc: addrConn.createTransport failed to connect to {Addr: "keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local:9666", ServerName: "keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local:9666", }. 

We'd like to avoid the cert-manager tool installation because of the following reasons:

  • We don't want to install the "cert-manager" in our infra with the only purpose of creating a certificate for KEDA.
  • Cert-manager consumes resources in the cluster and we have to monitor it.
  • We found a running condition issue adding this component to our Terraform (cert-manager CRDs take a while before it's created by itself.)
    Error: [resource mapping not found for name: "keda-operator-tls-certificates" namespace: "keda" from "": no matches for kind "Certificate" in version "cert-manager.io/v1" │ ensure CRDs are installed first, resource mapping not found for name: "keda-operator-ca" namespace: "keda" from "": no matches for kind "Certificate" in version "cert-manager.io/v1"

Is there another way to fix this through parametrizing metrics-service-address or something like that?

Thank you so much for this project

Hi @marandalucas, sorry but I won't really have the time to test in GKE in the next few days, but both issues you shared looks to be linked to a miss-match between the cluster-domain of your cluster and your configuration and not an issue with the chart itself (I might have misunderstood something though)

What makes me think of that is this part of the log you shared earlier :

certificate is valid for keda-operator, keda-operator, keda-operator.keda, keda-operator.keda.svc, keda-operator.keda.svc.cluster.local ... not keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local

as well as the

addrConn.createTransport failed to connect to {Addr: "keda-operator.keda.svc.gcp-prod-pv-na1-a.company.cluster.local:9666...

That does not seem related to a cert issue but more of an addressing issue

Could it be possible that your GKE cluster is using the default svc.cluster.local FQDN ? (in which case you wouldn't need to setup a clusterDomain).
One way to check the correct value to use is running the following command that creates a pod and does an nslookup:

kubectl run -it --image=ubuntu --restart=Never shell -- \
sh -c 'apt-get update > /dev/null && apt-get install -y dnsutils > /dev/null && \
nslookup kubernetes.default | grep Name | sed "s/Name:\skubernetes.default//"'`

Also I understand that you don't want to setup certificate-manager, by default the chart enables the operator to create a kedaorg-certs secret that is being created for TLS communication between keda's components.

Also, feel free to message me on the Kubernetes slack directly if you find it easier to have a "live" discussion about the issue.

Hello @marandalucas ,
You don't need cert manager, but you need to update the internal cert system too. (you can use cert-manager or the self-generated certs).
You have to add an extra arg in the operator k8s-cluster-domain: your-domain. This will take your domain into account for certificate generation.

extraArgs:
  # -- Additional KEDA Operator container arguments
  keda:
    k8s-cluster-domain: your-domain
clusterDomain: your-domain

I guess that we could automatically set the arg with clusterDomain value? 🤔 @lucchmielowski WDYT?

in any case, setting both you will be able to use KEDA without cert-manager.