pvsone / temporal-worker-cert-rotation

Example of how to rotate mTLS certificates in a Temporal Worker

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Temporal Worker mTLS Certificate Rotation

Background

A Temporal cluster may use mTLS to authenticate Worker client connections. This is the case with Temporal Cloud. The Temporal Cloud documentation includes instructions for generating client certificates for mTLS authentication. However, you will also need to plan for the rotation of your client certificates.

Certificate rotation can be done manually, but it's worth fully automating. Also, it is desirable to rotate the certificates without restarting the Worker application. A restart would result in clearing the Workflow cache used in sticky execution. With an empty cache, a Workflow in progress would need to rebuild its state from scratch when the Workflow execution resumes. The Worker would retrieve the Workflow history from the Temporal service and replay the execution. This would result in network and processing overhead that we would prefer to avoid.

Pre-requisites

This guide assumes that:

Overview

The steps to setup certificate rotation for a Temporal Go Worker on Kubernetes are:

  1. Create a CA certificate for your Temporal namespace
  2. Install cert-manager on your Kubernetes cluster
  3. Configure a CA Issuer to issue client certificates signed by the CA certificate
  4. Install the cert-manager csi-driver to generate client certificates in Pod volumes
  5. Configure your Temporal Worker to load the client certificate each time the connection is established
  6. Deploy your Temporal Worker Pod with the csi-driver volume

Let's go through each of these steps in detail.

1: Create a CA certificate

We will use certstrap to bootstrap the CA certificate. Certstrap is one of many tools that can create CAs. It can also create and sign client certificates. On our Kubernetes cluster, we will use cert-manager (instructions below) to create and sign the client certificates. Off cluster we will create and sign client certs with certstrap (an optional step below).

Generate the CA Certificate:

certstrap init --common-name "Rotation Demo CA" --passphrase ""

If you have an existing Temporal namespace, update the namespace with the new CA.

If you don't have an existing Temporal namespace, you can create one. Supply the CA certficate when creating the namespace. For example, using the tcld namespace create command:

# login to tcld cli
tcld login

# create the new namespace - this may take a few minutes
tcld namespace create --namespace rotation-demo \
    --region us-east-1 \
    --ca-certificate "$(cat ./out/Rotation_Demo_CA.crt | base64)"

# confirm the certificate for the namespace - my Temporal account id is 'sdvdw', yours will be different
tcld namespace ca list -n rotation-demo.sdvdw
[
    {
        "fingerprint": "c3020b30e7e016de334531788b54564b2125e975",
        "issuer": "CN=Rotation Demo CA",
        "subject": "CN=Rotation Demo CA",
        "notBefore": "2024-01-09T21:24:24Z",
        "notAfter": "2025-07-09T21:34:23Z",
        "base64EncodedData": "<omitted for brevity>"
    }
]

(Optional) Create a client certificate to use outside of Kubernetes, e.g. for the temporal CLI

# create a client certificate
certstrap request-cert --common-name rotation-demo-cli-client --passphrase ""

# sign the client certificate using the CA certificate
certstrap sign rotation-demo-cli-client --CA "Rotation Demo CA"

# test the certificate using the `temporal` CLI
temporal operator namespace describe \
    --address rotation-demo.sdvdw.tmprl.cloud:7233 \
    --tls-cert-path ./out/rotation-demo-cli-client.crt \
    --tls-key-path ./out/rotation-demo-cli-client.key \
    rotation-demo.sdvdw

  # successful output will show the namespace info, e.g.
  NamespaceInfo.Name                    rotation-demo.sdvdw
  NamespaceInfo.Id                      4807be30-0b1d-47c2-ab3b-bb9c14660f0b  
  # additional output omitted for brevity

2. Install cert-manager

Follow the instructions at https://cert-manager.io/docs/installation/ to install cert-manager on your Kubernetes cluster.

The default static configuration worked for me.

3. Configure a CA Issuer

First, create a Kubernetes Secret containing the CA certificate key pair:

kubectl create secret tls rotation-demo-ca-key-pair \
    --cert=out/Rotation_Demo_CA.crt \
    --key=out/Rotation_Demo_CA.key

Next, create a CA Issuer to issue client certificates signed by the CA certificate:

kubectl apply -f - <<EOF
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: rotation-demo-ca-issuer
spec:
  ca:
    secretName: rotation-demo-ca-key-pair
EOF

4. Install the cert-manager csi-driver

Follow the instructions at https://cert-manager.io/docs/usage/csi-driver/#installation to install the csi-driver on your Kubernetes cluster.

The helm upgrade ... command worked for me.

5. Configure your Temporal Worker

A Temporal Worker uses a Temporal Client to connect to the Temporal service. And the Temporal Client requires a client certificate to authenticate to the Temporal service. The connection is a long-lasting connection through which the Worker polls for tasks on a task queue.

However, the Temporal Cloud frontend closes this connection every 5 minutes. When the connection is closed, the Worker, using the Client, will establish a new connection. (Note: in self-hosted clusters the 5m default is configurable via the frontend.keepAliveMaxConnectionAge parameter.)

In a typical Worker program, the certificate files are read once during the Worker initialization. The certificate data is then supplied to the Client in the connection options. When the Worker reconnects to Temporal Cloud (after 5 minutes), the same static certificate data will be used. We see this pattern in the code snippet within the Go SDK developer's guide under How to connect to Temporal Cloud:

func main() {
    // Get the key and cert from your env or local machine
    clientKeyPath := "./secrets/yourkey.key"
    clientCertPath := "./secrets/yourcert.pem"
    ...

    // Use the crypto/tls package to create a cert object
    cert, err := tls.LoadX509KeyPair(clientCertPath, clientKeyPath)
    if err != nil {
        log.Fatalln("Unable to load cert and key pair.", err)
    }    

    // Add the cert to the tls certificates in the ConnectionOptions of the Client
    temporalClient, err := client.Dial(client.Options{
        HostPort:  hostPort,
        Namespace: namespace,
        ConnectionOptions: client.ConnectionOptions{
            TLS: &tls.Config{
                Certificates: []tls.Certificate{cert},
            },
        },
    })
    ...
}

In the above example, the tls.Config is constructed using the Certificates option. The cert, created on line 7, will be used every time the connection is established.

However, the Go docs describe an alternative to Certificates:

Clients doing client-authentication may set either Certificates or GetClientCertificate.

Using the GetClientCertificate option we can define a function that will be called each time the connection is created. We can load the certificate data dynamically each time, rather than only once on initialization. This allows us to achieve certificate rotation without requiring a restart of the Worker application.

Here is the code snippet above, updated to use GetClientCertificate instead of Certificates:

func main() {
    // Get the key and cert from your env or local machine
    clientKeyPath := "./secrets/yourkey.key"
    clientCertPath := "./secrets/yourcert.pem"
    ...
    
    // Load the cert via the GetClientCertificate function in the ConnectionOptions of the Client
    temporalClient, err := client.Dial(client.Options{
        HostPort:  hostPort,
        Namespace: namespace,
        ConnectionOptions: client.ConnectionOptions{
            TLS: &tls.Config{
                GetClientCertificate: func(info *tls.CertificateRequestInfo) (*tls.Certificate, error) {
                    // Use the crypto/tls package to create a cert object
                    cert, err := tls.LoadX509KeyPair(clientCertPath, clientKeyPath)
                    if err != nil {
                        return nil, err
                    }
                    return &cert, nil
                },
            },
        },
    })
    ...
}

I have implemented the above approach in the worker/main.go file in this repo.

6. Deploy your Temporal Worker Pod with the csi-driver volume

The Go application in this repository is available as a Docker image at pvsone/rotation-demo-worker-go:1.0.0.

A sample Kubernetes Pod manifest is available at manifests/pod.yaml. The Pod uses the csi-driver to generate the client certificate files as a Pod volume. The following volume attributes are used to configure the csi-driver:

        csi.cert-manager.io/issuer-name: rotation-demo-ca-issuer
        csi.cert-manager.io/common-name: rotation-demo-worker
        csi.cert-manager.io/duration: 5m
        csi.cert-manager.io/fs-group: "1000"

Based on the above settings, the cert-manager csi-driver will generate a client certificate with a common-name of rotation-demo-worker and a validity duration of 5 minutes. The client certificate will be signed by the Issuer we created in step 3, rotation-demo-ca-issuer. The FS group for the written files will be 1000.

Finally, the driver will keep track the certificate in order to monitor when it should be marked for renewal. When this happens, the driver will request a new signed certificate, and overwrite the existing certificate in the path. Magic! 🪄

Deploy the Pod:

kubectl apply -f manifests/pod.yaml

Once deployed, you can inspect the Pod filesystem to see that the certificate files in the /certs directory are overwritten with new, valid certificate files before the 5 minute duration expires.

Additionally you can see from the Pod logs that the GetClientCertificate function is called every 5 minutes to load the updated certificate files:

...
2024/01/10 22:33:16 GetClientCertificate: loading X509 client cert and key
2024/01/10 22:38:17 GetClientCertificate: loading X509 client cert and key
2024/01/10 22:43:17 GetClientCertificate: loading X509 client cert and key
...

Feel free to run some tests before, during and after the 5 minute intervals. You should not experience any connection failures due to certificate expiration or rotation. This repo does not include a starter program for the Workflow, but you can use the temporal CLI:

temporal workflow execute \
    --type GreetSomeone \
    --task-queue greeting-tasks \
    --input '"Rotey McRoteface"'

Success! We have achieved certificate rotation without restarting the Worker application.

What if I am not using the Go SDK?

  1. Dynamic reloading of client certificates is also supported in Java, Python and .NET. TypeScript support is in progress, and you can follow the status on this github issue. Official samples for each SDK are also forthcoming, but in the meantime I have used the below snippet successfully in Java.

    In Java you can use AdvancedTlsX509KeyManager with updateIdentityCredentialsFromFile to read the private key and certificate chains from the local file paths periodically.

    Here is an example of how to use it:

    String address = "namespace.account.temporal.cloud:7233";
    String tlsCertPath = "/path/to/tls.crt";
    String tlsKeyPath = "/path/to/tls.key"; // in pkcs8 format
    
    // Create a key manager that will update the identity credentials from the files every 5 minutes
    AdvancedTlsX509KeyManager keyManager = new AdvancedTlsX509KeyManager();
    keyManager.updateIdentityCredentialsFromFile(
            new File(tlsKeyPath), new File(tlsCertPath),
            5, TimeUnit.MINUTES,
            Executors.newSingleThreadScheduledExecutor());
    
    // Build the client ssl context and configure for gRPC with the advanced key manager
    SslContextBuilder sslContextBuilder = SslContextBuilder.forClient();
    GrpcSslContexts.configure(sslContextBuilder);
    SslContext sslContext = sslContextBuilder.keyManager(keyManager).build();
    
    // Generate the gRPC stubs using the client ssl context
    WorkflowServiceStubs service = WorkflowServiceStubs.newServiceStubs(
            WorkflowServiceStubsOptions.newBuilder()
                    .setTarget(address)
                    .setSslContext(sslContext)
                    .build()
    );
    
    // the rest of your worker code as normal, elided for brevity
    WorkflowClient client = WorkflowClient.newInstance(...);
    WorkerFactory factory = WorkerFactory.newInstance(...);
    Worker worker = factory.newWorker(...);
    worker.registerWorkflowImplementationTypes(...);
    worker.registerActivitiesImplementations(...);
    factory.start();
  2. Consider if a rolling restart of your Worker Pods is acceptable. For some use cases the overhead of terminating the Workers, and rebuilding the cache may not be a concern.

  3. Run your non-Go Worker application along with a Go proxy sidecar. The sidecar will handle the mTLS connection and the rotation of the client certificate. I have implemented a simple Temporal Go proxy that could easily be extended with GetClientCertificate approach in this guide.

  4. Consider Istio, or another similar service mesh. Istio can handle mTLS connections and certificate rotation, through their sidecar proxy.

What if I am not using Kubernetes?

If your platform allows for application files to be overwritten while the application is running, then you can use the GetClientCertificate approach in this guide. Otherwise, you may be forced to restart your Workers each time the certificate files are rotated.

About

Example of how to rotate mTLS certificates in a Temporal Worker


Languages

Language:Go 100.0%