itzg / mc-router

Routes Minecraft client connections to backend servers based upon the requested server address

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Not able to get the --auto-scale-up to scale up the statefulset

aapjeisbaas opened this issue · comments

The router container does find my service based on the annotations and connects fine when it's scaled to 1
The service account is connected and has access to what was described in the docs:

rules:
- apiGroups: [""]
  resources: ["services"]
  verbs: ["watch","list"]
- apiGroups: ["apps"]
  resources: ["statefulsets", "statefulsets/scale"]
  verbs: ["watch","list","get","update"]

The router container has the following args:

args: ["--in-kube-cluster", "--auto-scale-up", "--debug"]

Any pointers on how to debug this are welcome
I'm running K8s v1.23.1
I know this a bit old and have some patching planned ;-)

The debug logs of the router with statefulset at 0

2023-01-03T14:16:54.207912857Z time="2023-01-03T14:16:54Z" level=debug msg="Debug logs enabled"
2023-01-03T14:16:54.208180642Z time="2023-01-03T14:16:54Z" level=info msg="Listening for Minecraft client connections" listenAddress=":25565"
2023-01-03T14:16:54.212408646Z time="2023-01-03T14:16:54Z" level=info msg="Monitoring Kubernetes for Minecraft services"
2023-01-03T14:16:54.245421889Z time="2023-01-03T14:16:54Z" level=debug msg=ADD routableService="&{minecraft.steinvanbroekhoven.nl 10.108.90.145:25565 0x1095800}"
2023-01-03T14:16:54.245446732Z time="2023-01-03T14:16:54Z" level=info msg="Created route mapping" backend="10.108.90.145:25565" serverAddress=minecraft.steinvanbroekhoven.nl
2023-01-03T14:17:54.255262621Z time="2023-01-03T14:17:54Z" level=info msg="Got connection" client="192.168.1.4:38875"
2023-01-03T14:17:54.255296491Z time="2023-01-03T14:17:54Z" level=debug msg="Reading packet" client="192.168.1.4:38875"
2023-01-03T14:17:54.257833184Z time="2023-01-03T14:17:54Z" level=debug msg="Reading frame" client="192.168.1.4:38875"
2023-01-03T14:17:54.257856730Z time="2023-01-03T14:17:54Z" level=debug msg="Read frame length" client="192.168.1.4:38875" length=38
2023-01-03T14:17:54.257864589Z time="2023-01-03T14:17:54Z" level=debug msg="Reading frame content" client="192.168.1.4:38875" length=38 total=38
2023-01-03T14:17:54.257869937Z time="2023-01-03T14:17:54Z" level=debug msg="Read frame" client="192.168.1.4:38875" frame="Frame:[len=38, payload=0X00F9051F6D696E6563726166742E737465696E76616E62726F656B686F76656E2E6E6C753001]"
2023-01-03T14:17:54.257884207Z time="2023-01-03T14:17:54Z" level=debug msg="Read packet" client="192.168.1.4:38875" packet="Frame:[len=38, packetId=0, data=0XF9051F6D696E6563726166742E737465696E76616E62726F656B686F76656E2E6E6C753001]"
2023-01-03T14:17:54.257890825Z time="2023-01-03T14:17:54Z" level=debug msg="Got packet" client="192.168.1.4:38875" length=38 packetID=0
2023-01-03T14:17:54.258273242Z time="2023-01-03T14:17:54Z" level=debug msg="Got handshake" client="192.168.1.4:38875" handshake="&{761 minecraft.steinvanbroekhoven.nl 30000 1}"
2023-01-03T14:17:54.258284255Z time="2023-01-03T14:17:54Z" level=debug msg="Finding backend for server address" serverAddress=minecraft.steinvanbroekhoven.nl
2023-01-03T14:17:54.258289614Z time="2023-01-03T14:17:54Z" level=info msg="Connecting to backend" backendHostPort="10.108.90.145:25565" client="192.168.1.4:38875" server=minecraft.steinvanbroekhoven.nl
2023-01-03T14:17:55.291413109Z time="2023-01-03T14:17:55Z" level=warning msg="Unable to connect to backend" backend="10.108.90.145:25565" client="192.168.1.4:38875" error="dial tcp 10.108.90.145:25565: connect: connection refused" serverAddress=minecraft.steinvanbroekhoven.nl
2023-01-03T14:17:55.291464339Z time="2023-01-03T14:17:55Z" level=debug msg="Closing frontend connection" client="192.168.1.4:38875"
2023-01-03T14:17:55.370058981Z time="2023-01-03T14:17:55Z" level=info msg="Got connection" client="192.168.1.4:56259"
2023-01-03T14:17:55.370118907Z time="2023-01-03T14:17:55Z" level=debug msg="Reading packet" client="192.168.1.4:56259"
2023-01-03T14:17:55.373060131Z time="2023-01-03T14:17:55Z" level=debug msg="Reading legacy server list ping" client="192.168.1.4:56259"
2023-01-03T14:17:55.373181155Z time="2023-01-03T14:17:55Z" level=debug msg="Got packet" client="192.168.1.4:56259" length=0 packetID=254
2023-01-03T14:17:55.373210149Z time="2023-01-03T14:17:55Z" level=debug msg="Got legacy server list ping" client="192.168.1.4:56259" handshake="&{127 minecraft.steinvanbroekhoven.nl 30000}"
2023-01-03T14:17:55.373224451Z time="2023-01-03T14:17:55Z" level=debug msg="Finding backend for server address" serverAddress=minecraft.steinvanbroekhoven.nl
2023-01-03T14:17:55.373272967Z time="2023-01-03T14:17:55Z" level=info msg="Connecting to backend" backendHostPort="10.108.90.145:25565" client="192.168.1.4:56259" server=minecraft.steinvanbroekhoven.nl
2023-01-03T14:17:56.379409659Z time="2023-01-03T14:17:56Z" level=warning msg="Unable to connect to backend" backend="10.108.90.145:25565" client="192.168.1.4:56259" error="dial tcp 10.108.90.145:25565: connect: connection refused" serverAddress=minecraft.steinvanbroekhoven.nl
2023-01-03T14:17:56.379449254Z time="2023-01-03T14:17:56Z" level=debug msg="Closing frontend connection" client="192.168.1.4:56259"
2023-01-03T14:17:59.406531326Z time="2023-01-03T14:17:59Z" level=info msg="Got connection" client="192.168.1.4:38549"
2023-01-03T14:17:59.406592392Z time="2023-01-03T14:17:59Z" level=debug msg="Reading packet" client="192.168.1.4:38549"
2023-01-03T14:17:59.407634637Z time="2023-01-03T14:17:59Z" level=debug msg="Reading frame" client="192.168.1.4:38549"
2023-01-03T14:17:59.407657947Z time="2023-01-03T14:17:59Z" level=debug msg="Read frame length" client="192.168.1.4:38549" length=38
2023-01-03T14:17:59.407668275Z time="2023-01-03T14:17:59Z" level=debug msg="Reading frame content" client="192.168.1.4:38549" length=38 total=38
2023-01-03T14:17:59.407756065Z time="2023-01-03T14:17:59Z" level=debug msg="Read frame" client="192.168.1.4:38549" frame="Frame:[len=38, payload=0X00F9051F6D696E6563726166742E737465696E76616E62726F656B686F76656E2E6E6C753002]"
2023-01-03T14:17:59.407780886Z time="2023-01-03T14:17:59Z" level=debug msg="Read packet" client="192.168.1.4:38549" packet="Frame:[len=38, packetId=0, data=0XF9051F6D696E6563726166742E737465696E76616E62726F656B686F76656E2E6E6C753002]"
2023-01-03T14:17:59.407792701Z time="2023-01-03T14:17:59Z" level=debug msg="Got packet" client="192.168.1.4:38549" length=38 packetID=0
2023-01-03T14:17:59.407823488Z time="2023-01-03T14:17:59Z" level=debug msg="Got handshake" client="192.168.1.4:38549" handshake="&{761 minecraft.steinvanbroekhoven.nl 30000 2}"
2023-01-03T14:17:59.407839217Z time="2023-01-03T14:17:59Z" level=debug msg="Finding backend for server address" serverAddress=minecraft.steinvanbroekhoven.nl
2023-01-03T14:17:59.407904372Z time="2023-01-03T14:17:59Z" level=info msg="Connecting to backend" backendHostPort="10.108.90.145:25565" client="192.168.1.4:38549" server=minecraft.steinvanbroekhoven.nl
2023-01-03T14:18:00.439501722Z time="2023-01-03T14:18:00Z" level=warning msg="Unable to connect to backend" backend="10.108.90.145:25565" client="192.168.1.4:38549" error="dial tcp 10.108.90.145:25565: connect: connection refused" serverAddress=minecraft.steinvanbroekhoven.nl
2023-01-03T14:18:00.439564859Z time="2023-01-03T14:18:00Z" level=debug msg="Closing frontend connection" client="192.168.1.4:38549"

The debug logs of the router with statefulset scaled to 1

 2023-01-03T14:15:04.393582354Z time="2023-01-03T14:15:04Z" level=debug msg="Debug logs enabled"
2023-01-03T14:15:04.394024777Z time="2023-01-03T14:15:04Z" level=info msg="Listening for Minecraft client connections" listenAddress=":25565"
2023-01-03T14:15:04.394971703Z time="2023-01-03T14:15:04Z" level=info msg="Monitoring Kubernetes for Minecraft services"
2023-01-03T14:15:04.411944993Z time="2023-01-03T14:15:04Z" level=debug msg=ADD routableService="&{minecraft.steinvanbroekhoven.nl 10.108.90.145:25565 0x1095800}"
2023-01-03T14:15:04.411961937Z time="2023-01-03T14:15:04Z" level=info msg="Created route mapping" backend="10.108.90.145:25565" serverAddress=minecraft.steinvanbroekhoven.nl
2023-01-03T14:16:11.056452784Z time="2023-01-03T14:16:11Z" level=info msg="Got connection" client="192.168.1.4:10410"
2023-01-03T14:16:11.056491620Z time="2023-01-03T14:16:11Z" level=debug msg="Reading packet" client="192.168.1.4:10410"
2023-01-03T14:16:11.058124294Z time="2023-01-03T14:16:11Z" level=debug msg="Reading frame" client="192.168.1.4:10410"
2023-01-03T14:16:11.058139019Z time="2023-01-03T14:16:11Z" level=debug msg="Read frame length" client="192.168.1.4:10410" length=38
2023-01-03T14:16:11.058144267Z time="2023-01-03T14:16:11Z" level=debug msg="Reading frame content" client="192.168.1.4:10410" length=38 total=38
2023-01-03T14:16:11.058158085Z time="2023-01-03T14:16:11Z" level=debug msg="Read frame" client="192.168.1.4:10410" frame="Frame:[len=38, payload=0X00F9051F6D696E6563726166742E737465696E76616E62726F656B686F76656E2E6E6C753001]"
2023-01-03T14:16:11.058161976Z time="2023-01-03T14:16:11Z" level=debug msg="Read packet" client="192.168.1.4:10410" packet="Frame:[len=38, packetId=0, data=0XF9051F6D696E6563726166742E737465696E76616E62726F656B686F76656E2E6E6C753001]"
2023-01-03T14:16:11.058165819Z time="2023-01-03T14:16:11Z" level=debug msg="Got packet" client="192.168.1.4:10410" length=38 packetID=0
2023-01-03T14:16:11.058169412Z time="2023-01-03T14:16:11Z" level=debug msg="Got handshake" client="192.168.1.4:10410" handshake="&{761 minecraft.steinvanbroekhoven.nl 30000 1}"
2023-01-03T14:16:11.058173338Z time="2023-01-03T14:16:11Z" level=debug msg="Finding backend for server address" serverAddress=minecraft.steinvanbroekhoven.nl
2023-01-03T14:16:11.058176968Z time="2023-01-03T14:16:11Z" level=info msg="Connecting to backend" backendHostPort="10.108.90.145:25565" client="192.168.1.4:10410" server=minecraft.steinvanbroekhoven.nl
2023-01-03T14:16:11.058216133Z time="2023-01-03T14:16:11Z" level=debug msg="Relayed handshake to backend" amount=39
2023-01-03T14:16:11.064533113Z time="2023-01-03T14:16:11Z" level=info msg="Finished relay backend->frontend" amount=194 client="192.168.1.4:10410"
2023-01-03T14:16:11.064560969Z time="2023-01-03T14:16:11Z" level=debug msg="Closing backend connection" client="192.168.1.4:10410"
2023-01-03T14:16:11.064629855Z time="2023-01-03T14:16:11Z" level=debug msg="Closing frontend connection" client="192.168.1.4:10410"
2023-01-03T14:16:11.064671955Z time="2023-01-03T14:16:11Z" level=info msg="Finished relay frontend->backend" amount=12 client="192.168.1.4:10410"
2023-01-03T14:16:12.969829352Z time="2023-01-03T14:16:12Z" level=info msg="Got connection" client="192.168.1.4:18179"
2023-01-03T14:16:12.969866323Z time="2023-01-03T14:16:12Z" level=debug msg="Reading packet" client="192.168.1.4:18179"
2023-01-03T14:16:12.972336714Z time="2023-01-03T14:16:12Z" level=debug msg="Reading frame" client="192.168.1.4:18179"
2023-01-03T14:16:12.972358917Z time="2023-01-03T14:16:12Z" level=debug msg="Read frame length" client="192.168.1.4:18179" length=38
2023-01-03T14:16:12.972364091Z time="2023-01-03T14:16:12Z" level=debug msg="Reading frame content" client="192.168.1.4:18179" length=38 total=38
2023-01-03T14:16:12.972370122Z time="2023-01-03T14:16:12Z" level=debug msg="Read frame" client="192.168.1.4:18179" frame="Frame:[len=38, payload=0X00F9051F6D696E6563726166742E737465696E76616E62726F656B686F76656E2E6E6C753002]"
2023-01-03T14:16:12.972373880Z time="2023-01-03T14:16:12Z" level=debug msg="Read packet" client="192.168.1.4:18179" packet="Frame:[len=38, packetId=0, data=0XF9051F6D696E6563726166742E737465696E76616E62726F656B686F76656E2E6E6C753002]"
2023-01-03T14:16:12.972377585Z time="2023-01-03T14:16:12Z" level=debug msg="Got packet" client="192.168.1.4:18179" length=38 packetID=0
2023-01-03T14:16:12.972384913Z time="2023-01-03T14:16:12Z" level=debug msg="Got handshake" client="192.168.1.4:18179" handshake="&{761 minecraft.steinvanbroekhoven.nl 30000 2}"
2023-01-03T14:16:12.972388616Z time="2023-01-03T14:16:12Z" level=debug msg="Finding backend for server address" serverAddress=minecraft.steinvanbroekhoven.nl
2023-01-03T14:16:12.972428579Z time="2023-01-03T14:16:12Z" level=info msg="Connecting to backend" backendHostPort="10.108.90.145:25565" client="192.168.1.4:18179" server=minecraft.steinvanbroekhoven.nl
2023-01-03T14:16:12.972701807Z time="2023-01-03T14:16:12Z" level=debug msg="Relayed handshake to backend" amount=39
2023-01-03T14:16:18.260879719Z time="2023-01-03T14:16:18Z" level=info msg="Finished relay frontend->backend" amount=1351 client="192.168.1.4:18179"
2023-01-03T14:16:18.260914193Z time="2023-01-03T14:16:18Z" level=debug msg="Closing backend connection" client="192.168.1.4:18179"
2023-01-03T14:16:18.260933996Z time="2023-01-03T14:16:18Z" level=info msg="Finished relay backend->frontend" amount=2325258 client="192.168.1.4:18179"
2023-01-03T14:16:18.260937616Z time="2023-01-03T14:16:18Z" level=debug msg="Closing frontend connection" client="192.168.1.4:18179"
2023-01-03T14:16:18.412627919Z time="2023-01-03T14:16:18Z" level=info msg="Got connection" client="192.168.1.4:4560"
2023-01-03T14:16:18.412651552Z time="2023-01-03T14:16:18Z" level=debug msg="Reading packet" client="192.168.1.4:4560"
2023-01-03T14:16:18.412671070Z time="2023-01-03T14:16:18Z" level=debug msg="Reading frame" client="192.168.1.4:4560"
2023-01-03T14:16:18.412675951Z time="2023-01-03T14:16:18Z" level=debug msg="Read frame length" client="192.168.1.4:4560" length=38
2023-01-03T14:16:18.412679743Z time="2023-01-03T14:16:18Z" level=debug msg="Reading frame content" client="192.168.1.4:4560" length=38 total=38
2023-01-03T14:16:18.412691578Z time="2023-01-03T14:16:18Z" level=debug msg="Read frame" client="192.168.1.4:4560" frame="Frame:[len=38, payload=0X00F9051F6D696E6563726166742E737465696E76616E62726F656B686F76656E2E6E6C753001]"
2023-01-03T14:16:18.412712366Z time="2023-01-03T14:16:18Z" level=debug msg="Read packet" client="192.168.1.4:4560" packet="Frame:[len=38, packetId=0, data=0XF9051F6D696E6563726166742E737465696E76616E62726F656B686F76656E2E6E6C753001]"
2023-01-03T14:16:18.412716683Z time="2023-01-03T14:16:18Z" level=debug msg="Got packet" client="192.168.1.4:4560" length=38 packetID=0
2023-01-03T14:16:18.412749665Z time="2023-01-03T14:16:18Z" level=debug msg="Got handshake" client="192.168.1.4:4560" handshake="&{761 minecraft.steinvanbroekhoven.nl 30000 1}"
2023-01-03T14:16:18.412775746Z time="2023-01-03T14:16:18Z" level=debug msg="Finding backend for server address" serverAddress=minecraft.steinvanbroekhoven.nl
2023-01-03T14:16:18.412819599Z time="2023-01-03T14:16:18Z" level=info msg="Connecting to backend" backendHostPort="10.108.90.145:25565" client="192.168.1.4:4560" server=minecraft.steinvanbroekhoven.nl
2023-01-03T14:16:18.412977064Z time="2023-01-03T14:16:18Z" level=debug msg="Relayed handshake to backend" amount=39
2023-01-03T14:16:18.419496797Z time="2023-01-03T14:16:18Z" level=info msg="Finished relay backend->frontend" amount=272 client="192.168.1.4:4560"
2023-01-03T14:16:18.419518092Z time="2023-01-03T14:16:18Z" level=debug msg="Closing backend connection" client="192.168.1.4:4560"
2023-01-03T14:16:18.419522560Z time="2023-01-03T14:16:18Z" level=debug msg="Closing frontend connection" client="192.168.1.4:4560"
2023-01-03T14:16:18.419526433Z time="2023-01-03T14:16:18Z" level=info msg="Finished relay frontend->backend" amount=12 client="192.168.1.4:4560"

@vorburger I was wondering if you could help look at this one?

It seems like we might need more debug logs to see what was detected, selected, etc.

Hi, is there any way I can help you debug and solve this?

I just tried all versions from 1.15.0 up to 1.18.0 but no difference
I tried setting cluster role rules to * * * without succes

Contributing more debug logs into the code would be great. I would be glad to review and merge PRs for that.

I've added loads of debug logs throughout the k8s.go mapping and watcher parts.
I found my issue:
StatefulSet.metadata,name and StatefulSet.spec.serviceName need to be the same

---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: minecraft-set
spec:
  serviceName: minecraft-set

I've learned a lot about go which I never used before, thanks!
Maybe i'll try and make an effort to create a way to scale down if no more players are online for x time
Not sure yet f that would make more sense to build inside the mc-router or use a metric on the pod with a HorizontalPodAutoscaler and a custom metrics like:
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-metrics-not-related-to-kubernetes-objects

I've added loads of debug logs throughout the k8s.go mapping and watcher parts.

I found my issue:

StatefulSet.metadata,name and StatefulSet.spec.serviceName need to be the same


---

apiVersion: apps/v1

kind: StatefulSet

metadata:

  name: minecraft-set

spec:

  serviceName: minecraft-set

Good find. That sounds a like a bug that could be fixed.

I've learned a lot about go which I never used before, thanks!

Cool. If you want to take stab at fixing the bug, the. I'd be glad review/help on a PR.

Maybe i'll try and make an effort to create a way to scale down if no more players are online for x time

Not sure yet f that would make more sense to build inside the mc-router or use a metric on the pod with a HorizontalPodAutoscaler and a custom metrics like:

https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/#autoscaling-on-metrics-not-related-to-kubernetes-objects

Good point. Using HPA might have been more correct this whole time.

Found a way to auto scale down without modifications, the k8s scaling doesn't allow scaling to 0 by default which is a bit strange but a cron job and an ugly bash one liner works fine for now.

mc-cron.yaml

A k8s cronjob that checks for pods with the label app=minecraft-container and runs:
/usr/local/bin/mc-monitor status if the output contains: online=0
kubectl scale statefulset $deployment --replicas=0
The var $deployment is just de pod name with -0 removed as this is often the case.
It could be more stable to get the .metadata.ownerReferences
Also no multi namespace support here, if that is necessary rewrite it in python or go.
Adding jq might also do the trick.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: mc-shutdown
rules:
- apiGroups: ["apps"]
  resources: ["statefulsets", "statefulsets/scale"]
  verbs: ["list","get","update", "patch"]
- apiGroups: [""]
  resources: ["pods", "pods/log"]
  verbs: ["get", "list"]
- apiGroups: [""]
  resources: ["pods/exec"]
  verbs: ["create"]

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: mc-shutdown
subjects:
- kind: ServiceAccount
  name: mc-shutdown
  namespace: default
roleRef:
  kind: ClusterRole
  name: mc-shutdown
  apiGroup: "rbac.authorization.k8s.io"

---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: mc-shutdown

---
apiVersion: batch/v1
kind: CronJob
metadata:
  name: mc-shutdown
spec:
  schedule: "*/15 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: mc-shutdown
          containers:
          - name: shutdown
            image: bitnami/kubectl:latest
            imagePullPolicy: IfNotPresent
            command:
            - /bin/bash
            - -c
            - for p in $(kubectl get pods -l app=minecraft-container -o=jsonpath="{range .items[*]}{.metadata.name},"| sed 's/,/\n/g'); do echo $p ; deployment=$(echo $p |sed 's/-0//g') ; if [[ $(kubectl exec -i $p -- /usr/local/bin/mc-monitor status) == *"online=0"* ]] ;then kubectl scale statefulset $deployment --replicas=0 ; fi; done
          restartPolicy: OnFailure

For now this completely fixed my issues and I have a auto starting k8s pod that shuts down if no players are online.
If it's stable and I have some time to kill I'm glad to document my steps and add it to the repo / readme if you want.

I took some time to set this up again and think I was just missing labels on k8s objects or something.
My working autoscaling config can be found in: #270