estafette / estafette-gke-preemptible-killer

Kubernetes controller to spread preemption for preemtible VMs in GKE to avoid mass deletion after 24 hours

Home Page:https://helm.estafette.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

whitelist hours and black list hours are being ignored

akathimi opened this issue · comments

Hey there, I have installed the helm chart and configured the whitelist and blacklist hours as follows:
kubectl describe deployments -n preempitible-killer

Namespace:          preempitible-killer
CreationTimestamp:  Tue, 16 Jul 2019 18:26:51 +0300
Labels:             app=estafette-gke-preemptible-killer
Annotations:        deployment.kubernetes.io/revision: 1
Selector:           app=estafette-gke-preemptible-killer
Replicas:           1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:       Recreate
MinReadySeconds:    0
Pod Template:
  Labels:           app=estafette-gke-preemptible-killer
                    version=1.1.19-1
  Annotations:      prometheus.io/port: 9001
                    prometheus.io/scrape: true
  Service Account:  estafette-gke-preemptible-killer
  Containers:
   estafette-gke-preemptible-killer:
    Image:      estafette/estafette-gke-preemptible-killer:1.1.19
    Port:       <none>
    Host Port:  <none>
    Limits:
      cpu:     50m
      memory:  128Mi
    Requests:
      cpu:     10m
      memory:  16Mi
    Liveness:  http-get http://:9001/metrics delay=30s timeout=1s period=10s #success=1 #failure=3
    Environment:
      DRAIN_TIMEOUT:                   300
      INTERVAL:                        600
      GOOGLE_APPLICATION_CREDENTIALS:  /etc/app-secrets/google-service-account.json
      WHITELIST_HOURS:                 02:30 - 03:30, 17:30 - 23:59 
      BLACKLIST_HOURS:                 00:00 - 01:30, 04:00 - 16:30 
    Mounts:
      /etc/app-secrets from app-secrets (rw)
  Volumes:
   app-secrets:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  estafette-gke-preemptible-killer
    Optional:    false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   estafette-gke-preemptible-killer-56d97cb669 (1/1 replicas created)
Events:          <none>

you can see my configuration in the environment section. However looking back in the log and when running kubectl get nodes im seeing that my nodes were killed during the black list duration. Is there anything wrong in my config , or is it in fact a bug ?

The pod's log:

 kubectl logs estafette-gke-preemptible-killer-56d97cb669-xjgv7 -n preempitible-killer | grep 'Node deleted'
{"time":"2019-07-16T15:27:45Z","severity":"info","app":"estafette-gke-preemptible-killer","version":"1.1.19","host":"gke-navy-preemptible-pool-1-9e36d2de-n6dr","message":"Node deleted"}
{"time":"2019-07-17T10:24:05Z","severity":"info","app":"estafette-gke-preemptible-killer","version":"1.1.19","host":"gke-navy-preemptible-pool-1-9e36d2de-rb51","message":"Node deleted"}

also, thank you for this very helpful idea :)

same here

This description is completely accurate in that the whitelists were being ignored.

Apologies!

Solved by #36 & should work reliably if the PR gets in.

Reference me @andrei-pavel for any other issues with whitelists so I get notified faster. I found out by randomly giving it a test this evening when I had nothing else to do.

Incidentally, your configuration

      WHITELIST_HOURS:                 02:30 - 03:30, 17:30 - 23:59 
      BLACKLIST_HOURS:                 00:00 - 01:30, 04:00 - 16:30

can be rewritten to be more simple as

      WHITELIST_HOURS:                 02:30 - 03:30, 17:30 - 23:59 
      BLACKLIST_HOURS:

but you might have been just testing it out.

Also, if you want to cover that last minute in the 17:30 - 23:59 interval, you can use 17:30 - 00:00. I know 17:30 - 24:00 gives a hour out of range panic. The time library is what handles it that way. 17:30 - 01:30 is also a valid interval.

Incidentally, your configuration

      WHITELIST_HOURS:                 02:30 - 03:30, 17:30 - 23:59 
      BLACKLIST_HOURS:                 00:00 - 01:30, 04:00 - 16:30

can be rewritten to be more simple as

      WHITELIST_HOURS:                 02:30 - 03:30, 17:30 - 23:59 
      BLACKLIST_HOURS:

but you might have been just testing it out.

Also, if you want to cover that last minute in the 17:30 - 23:59 interval, you can use 17:30 - 00:00. I know 17:30 - 24:00 gives a hour out of range panic. The time library is what handles it that way. 17:30 - 01:30 is also a valid interval.

thanks thats very helpful 👍