CoreDNS, local-path-provisioner, metrics-server not deployed HA in HA mode

Question

CoreDNS, local-path-provisioner, metrics-server not deployed HA in HA mode

rdvansloten opened this issue 9 months ago · comments

Hi,

I have deployed a HA Cluster using the current latest version of this role, which works fine. However, it does not spawn multiple CoreDNS and other system pods. When knocking down one node, it takes about 3-5 minutes for the system to realize that the node containing these Pods is gone and then re-deploys. Considering CoreDNS is pretty... core, this 5 minutes is a long time. I can't find any setting related to coredns or other system apps HA in this repository.

Rudy van Sloten · Answer 1 · Mon Feb 26 2024 23:34:41 GMT+0800 (China Standard Time)

Interim solution for CoreDNS

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: coredns-hpa
  namespace: kube-system
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: coredns
  minReplicas: 3
  maxReplicas: 5
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 75

However, because CoreDNS has a topologyspread and the others don't, this can't be applied to local-path-provisioner and metrics-server.

Derek Nola · Answer 2 · Wed Feb 28 2024 02:51:08 GMT+0800 (China Standard Time)

Thats not how those components work. CoreDNS, local-path-provisioner, metrics-server are deployed on the cluster as a whole, not on every server. They aren't considers server specific components, like kube-apiserver or the controller manager. They are just regular workloads. If you want to modify them, see https://docs.k3s.io/helm#customizing-packaged-components-with-helmchartconfig, you can create /var/lib/rancher/k3s/server/manifests/coredns-ha.yaml with the modifications you want.

Rudy van Sloten · Answer 3 · Wed Feb 28 2024 06:54:32 GMT+0800 (China Standard Time)

Thats not how those components work. CoreDNS, local-path-provisioner, metrics-server are deployed on the cluster as a whole, not on every server. They aren't considers server specific components, like kube-apiserver or the controller manager. They are just regular workloads. If you want to modify them, see https://docs.k3s.io/helm#customizing-packaged-components-with-helmchartconfig, you can create /var/lib/rancher/k3s/server/manifests/coredns-ha.yaml with the modifications you want.

Offering a supposedly highly available deployment option and then replying "lol roll your own" is a bit shortsighted, no? Deploying coredns as a DaemonSet on the master nodes (as I now plan to do) for example could at least be given an inch of consideration. At some point there's been a conscious choice to include these "regular workloads" and assign them as part of the core k3s experience. (since you have to really go out of your way to get rid of them once installed, or do this pre-install with obscured args)

The fact of the matter is that the current HA deployment of k3s simply isn't HA because of these "regular workloads" you essentially push onto the user are essential to basic cluster operations (how does one operate a cluster when DNS, monitoring and file provisioner went poof because one of out 3 nodes is rebooting/broken?)

This has been brought up and iterated on before:
k3s-io/k3s#1606

Derek Nola · Answer 4 · Sat Mar 16 2024 01:37:26 GMT+0800 (China Standard Time)

I understand your frustration with this. I totally get the "well this isn't HA really is it" argument. Two things on this:

This isn't something that the k3s-ansible repo is meant to solve. This repo only covers deploying vanilla K3s with a few config options. I added the extra_manifests configuration to help enable exactly this scenario, where you can easily supply manifest files you want deployed automatically when provisioning a cluster.
As covered in k3s-io/k3s#1606, this was ultimately a design decision between resource usage and redundancy. As K3s is usually deployed on the edge in resource constrained environments, we decided that by default, deploying less pods was the better strategy. K3s is an opinionated K8s distro. At the same time, deploying modifications for things you don't like is a core mission of K3s. Its why its so easy to use the --disable flag to turn of components you don't like and swap them out. So many people disable traefik and run metalLB. Similarly, you are free to simply modify the manifests to include a replica configuration of 2+.

I think we could help alleviate this somewhat by having better documentation in https://github.com/k3s-io/docs about how do go about modify the helm manifest and perhaps include and example of "making CoreDNS run with multiple replicas".