sigstore / policy-controller

Sigstore Policy Controller - an admission controller that can be used to enforce policy on a Kubernetes cluster based on verifiable supply-chain metadata from cosign

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AKS Policy Controller Digest & Authentication Error

ejohn20 opened this issue · comments

Description

Hey folks, I'm struggling to get this working in an Azure Kubernetes Service cluster. Here's what I've done up to this point, and would love any info you can provide on what I've done wrong, where to look for troubleshooting, or possible repos to PR that patch to...

  1. Build image is being created in a GL CI pipeline using a key stored in the Vault. All goes well....
cosign: A tool for Container Signing, Verification and Storage in an OCI registry.
GitVersion:    v2.2.0
GitCommit:     546f1c5b91ef58d6b034a402d0211d980184a0e5
GitTreeState:  clean
BuildDate:     2023-08-31T18:52:52Z
GoVersion:     go1.21.0
Compiler:      gc
Platform:      linux/amd64

tlog entry created with index: 89560215
Pushing signature to: dminfraacrrmz5nifl.azurecr.io/dm/api
  1. Manually verifying the image signature works fine.
$ echo ${IMAGE_NAME}@${MANIFEST_DIGEST}
dminfraacrrmz5nifl.azurecr.io/dm/api:v47@sha256:d211818dca7fa35514bcbc9a280bbbb70a1e136912d2aea9723d946219172865
$ cosign verify --key cosign.pub "${IMAGE_NAME}@${MANIFEST_DIGEST}"

Verification for dminfraacrrmz5nifl.azurecr.io/dm/api@sha256:d211818dca7fa35514bcbc9a280bbbb70a1e136912d2aea9723d946219172865 --
The following checks were performed on each of these signatures:
  - The cosign claims were validated
  - Existence of the claims in the transparency log was verified offline
  - The signatures were verified against the specified public key
  1. Next, we set up the policy controller in AKS using helm:
$ helm upgrade --install policy-controller sigstore/policy-controller --version 0.6.8 \
  --namespace cosign-system --create-namespace --wait --timeout "5m31s" \
  --set-json webhook.configData="{\"no-match-policy\": \"warn\"}" \
  --set webhook.serviceAccount.name="policy-controller" \
  --set-json webhook.serviceAccount.annotations="{\"azure.workload.identity/client-id\": \"${COSIGN_SERVICE_PRINCIPAL_CLIENT_ID}\", \"azure.workload.identity/tenant-id\": \"${ARM_TENANT_ID}\"}" \
  --set-json webhook.customLabels="{\"azure.workload.identity/use\": \"'true'\"}"

The custom label and annotations maps the policy controller pod to an Azure Service Principal with AcrPull permissions. We can see that the service account is created with the annotations.

$ k describe sa -n cosign-system policy-controller

Name:                policy-controller
Namespace:           cosign-system
...
Annotations: azure.workload.identity/client-id: 1111111111
                     azure.workload.identity/tenant-id: 22222222
                     meta.helm.sh/release-name: policy-controller
                     meta.helm.sh/release-namespace: cosign-system

The pod is created correctly with the environment variables and volume mount:

$ k describe pod -n cosign-system policy-controller-webhook-79dc89496f-h4cqs

Name:             policy-controller-webhook-79dc89496f-h4cqs
Namespace:        cosign-system
Priority:         0
Service Account:  policy-controller
...
Labels:           app.kubernetes.io/instance=policy-controller
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=policy-controller
                  app.kubernetes.io/version=0.8.2
                  azure.workload.identity/use=true
...
policy-controller-webhook:
    Container ID:   containerd://f48df9f9fdfab6b824643e9d1466938ee5f9833dfa6cdd74e6bb58df8c268e1c
    Image:          ghcr.io/sigstore/policy-controller/policy-controller@sha256:f291fce5b9c1a69ba54990eda7e0fe4114043b1afefb0f4ee3e6f84ec9ef1605
    Environment:
      SYSTEM_NAMESPACE:            cosign-system (v1:metadata.namespace)
      CONFIG_LOGGING_NAME:         policy-controller-webhook-logging
      CONFIG_OBSERVABILITY_NAME:   policy-controller-webhook-observability
      METRICS_DOMAIN:              sigstore.dev/policy
      WEBHOOK_NAME:                webhook
      HOME:                        /home/nonroot
      AZURE_CLIENT_ID:             111111111
      AZURE_TENANT_ID:           22222222
      AZURE_FEDERATED_TOKEN_FILE:  /var/run/secrets/azure/tokens/azure-identity-token
      AZURE_AUTHORITY_HOST:        https://login.microsoftonline.com/
  1. Applying the following policy to the cluster
$ kubectl apply -f ./assets/policy/cosign-cluster-image-policy.yaml

apiVersion: policy.sigstore.dev/v1beta1
kind: ClusterImagePolicy
metadata:
  name: trust-signed-dm-images
spec:
  mode: warn
  images:
    - glob: "dminfra*.azurecr.io/dm/**"
  authorities:
    - key:
        data: |
          -----BEGIN PUBLIC KEY-----
          MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEp192D+udFjb3PkOCFKHGrASDeoaZ
          Fhi60FBb+UOqlK6iiUynQ7I81LWEkcu9jU5fnbwdxTIDCSA0NOySdoPtsQ==
          -----END PUBLIC KEY-----
        hashAlgorithm: sha256
  1. Creating a namespace and applying the restriction:
$ kubectl create namespace sig-test
$ kubectl label ns sig-test policy.sigstore.dev/include=true

Failures

Running the following command fails in AKS, but the command DOES work in an EKS cluster. How are these different? Is this failure really just the same authentication failure in the second command?

$ echo ${IMAGE_NAME}
dminfraacrrmz5nifl.azurecr.io/dm/api:v47
$ k run -n sig-test dm-web-test --image ${IMAGE_NAME}

Error from server (BadRequest): admission webhook "policy.sigstore.dev" denied the request: validation failed: invalid value: dminfraacrrmz5nifl.azurecr.io/dm/api:v47 must be an image digest: spec.containers[0].image

Updating the image to include the digest also fails with an UNAUTHORIZED error message when I'd expect the workload identity environment variables to enable authentication to the private registry.

 $ echo ${IMAGE_NAME}@${MANIFEST_DIGEST}
dminfraacrrmz5nifl.azurecr.io/dm/api:v47@sha256:d211818dca7fa35514bcbc9a280bbbb70a1e136912d2aea9723d946219172865
$ k run -n sig-test dm-web-test --image ${IMAGE_NAME}@${MANIFEST_DIGEST}

Warning: failed policy: trust-signed-dm-images: spec.containers[0].image
Warning: dminfraacrrmz5nifl.azurecr.io/dm/api:v47@sha256:d211818dca7fa35514bcbc9a280bbbb70a1e136912d2aea9723d946219172865 signature key validation failed for authority authority-0 for dminfraacrrmz5nifl.azurecr.io/dm/api@sha256:d211818dca7fa35514bcbc9a280bbbb70a1e136912d2aea9723d946219172865: GET https://dminfraacrrmz5nifl.azurecr.io/oauth2/token?scope=repository%3Adm%2Fapi%3Apull&service=dminfraacrrmz5nifl.azurecr.io: UNAUTHORIZED: authentication required, visit https://aka.ms/acr/authorization for more information.

I've not run this on Azure, so can't say for sure. But I'd try to rule out the auth, by creating a simple test image and require signing. Easy way to test would be to just create a simple image that doesn't require auth maybe using something like ttl.sh here (in step 3):
https://www.chainguard.dev/unchained/policy-controller-101

And see if it's related to auth. Ofc if you have another way to run a container that doesn't require auth, I'd try that.

The acr auth library upgrade is needed, as the acr help library is very dated and uses end of life libraries from MS. (#1424). I also never got this working using workload identity, so I think the upgrade will help with that.

However, I was able to get this working using the managed identity on the worker nodes and granting the kubelet identity permission to the ACR repository:

resource "azurerm_kubernetes_cluster" "aks" {
  ...
  identity {
    type = "SystemAssigned"
  }
  ...
}

resource "azurerm_role_assignment" "acr" {
  principal_id  = azurerm_kubernetes_cluster.aks.kubelet_identity[0].object_id
  role_definition_name  = "AcrPull"
  scope  = azurerm_container_registry.acr.id
  skip_service_principal_aad_check = true
}

Then, installing the policy controller setting the webhook's AZURE_CLIENT_ID to the kubelet's client id:

KUBELET_CLIENT_ID=$(az aks show --resource-group MY-AKS-RG --name "MY-AKS-CLUSTER" --only-show-errors | jq -r '.identityProfile.kubeletidentity.clientId')

helm upgrade --install policy-controller sigstore/policy-controller --version 0.6.3 \
  --namespace cosign-system --create-namespace --wait --timeout "5m31s" \
 --set-json webhook.configData='{"no-match-policy": "warn"}' \
 --set webhook.env.AZURE_CLIENT_ID=$KUBELET_CLIENT_ID