giantswarm / aws-operator

Manages Kubernetes clusters running on AWS (before Cluster API)

Home Page:https://www.giantswarm.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Race condition when decrypting certificates on master

teemow opened this issue · comments

During a reboot of a master the order how the cloud-config is process lead to a race condition while decrypting the certificates.

The certificates of the master had been renewed before the reboot of the master. When the master came back up the old encrypted files were still there and the process that decrypts the certificates updated the certificate of the apiserver with an old private key. A few seconds later the encrypted file for the apiserver key was updated via the cloud-config.

-rw-------. 1 root root 1840 2018-01-24 09:42:22.000000000 +0000 apiserver-key.pem.enc
-rw-------. 1 root root 1678 2018-01-24 09:42:17.000000000 +0000 apiserver-key.pem

We need to make sure that all files are updated before we decrypt then. Imo this race could also lead to missing certificates when creating a new cluster.

It seems that the more straightforward way to solve the problem is not enabling the decrypt units, this way we can make sure that they won't be executed on reboot before the cloud-init unit, only after the new files have been created and the new decrypt units are started.

I've tried first to set an ordering dependency with cloud-init to the decrypt units but haven't found an easy way to do it, on master there's an oem-cloudinit.service unit which pulls the user data and creates another transient unit to execute it. The name of this transient unit has a random component, so we can set an order dependency on it.

Tested and working fine, PR incoming.