hashicorp / consul-api-gateway

The Consul API Gateway is a dedicated ingress solution for intelligently routing traffic to applications running on a Consul Service Mesh.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Controller crashes when certificates are handled by Vault PKI

joatmon08 opened this issue · comments

Overview of the Issue

I configured a Gateway with a TLS certificate that is generated by Vault PKI secrets engine. It comes up successfully but when I create an HTTPRoute to add an upstream, the API Gateway controller throws an error and fails to add the route because it cannot validate the certificate's SPIFFE URL.

Reproduction Steps

  1. Create three self-signed root CAs and configure each with a
    Vault PKI secrets engine with two levels of intermediate certificates.

    • Cluster certificate (self-signed)
    • Connect CA (self-signed)
    • Consul API Gateway certificate (self-signed)
  2. Create a Consul cluster that uses Vault PKI Secrets Engine.

    global:
      datacenter: "${CONSUL_DATACENTER}"
      name: consul
      secretsBackend:
        vault:
          enabled: true
          consulServerRole: ${CONSUL_SERVER_ROLE}
          consulClientRole: ${CONSUL_CLIENT_ROLE}
          consulCARole: ${CONSUL_CA_ROLE}
          manageSystemACLsRole: ${SERVER_ACL_INIT_ROLE}
          agentAnnotations: |
            "vault.hashicorp.com/namespace": "${VAULT_NAMESPACE}"
          connectCA:
            address: ${VAULT_ADDR}
            rootPKIPath: ${CONSUL_CONNECT_PKI_PATH_ROOT}
            intermediatePKIPath: ${CONSUL_CONNECT_PKI_PATH_INT}
            authMethodPath: ${KUBERNETES_AUTH_METHOD_PATH}
            additionalConfig: '"{"connect": [{ "ca_config": [{ "namespace": "${VAULT_NAMESPACE}"}]}]}"'
      tls:
        enabled: true
        enableAutoEncrypt: true
        caCert:
          secretName: "${CONSUL_PKI_PATH}/cert/ca"
        caKey:
          secretName: "${CONSUL_PKI_PATH}/issue/${CONSUL_SERVER_ROLE}"
          secretKey: private_key
      acls:
        manageSystemACLs: true
        bootstrapToken:
          secretName: "${CONSUL_STATIC_PATH}/data/bootstrap"
          secretKey: token
      gossipEncryption:
        secretName: ${CONSUL_STATIC_PATH}/data/gossip
        secretKey: key
    
    server:
      replicas: 1
      serverCert:
        secretName: "${CONSUL_PKI_PATH}/issue/${CONSUL_SERVER_ROLE}"
    
    connectInject:
      replicas: 1
      enabled: true
    
    controller:
      enabled: true
    
    terminatingGateways:
      enabled: true
      defaults:
        replicas: 1
    
    apiGateway:
      enabled: true
      logLevel: trace
      image: "hashicorp/consul-api-gateway:0.2.1"
      managedGatewayClass:
        serviceType: LoadBalancer
    
    ui:
      enabled: true
      service:
        enabled: true
        type: LoadBalancer
  3. Deploy a gateway with a TLS certificate.

    apiVersion: gateway.networking.k8s.io/v1alpha2
    kind: Gateway
    metadata:
      name: api-gateway
      namespace: default
    spec:
      gatewayClassName: consul-api-gateway
      listeners:
      - allowedRoutes:
          namespaces:
            from: Same
        name: https
        port: 8443
        protocol: HTTPS
        tls:
          certificateRefs:
          - group: ""
            kind: Secret
            name: consul-api-gateway-cert
          mode: Terminate

    The gateway comes up:

    $ kubectl get pods
    NAME                                             READY   STATUS    RESTARTS       AGE
    api-gateway-5d5dd555b5-9kxqh                     1/1     Running   0              8m35s
    consul-api-gateway-controller-6489bfb4dc-rn8rw   2/2     Running   18 (23m ago)   85m
  4. Deploy an HTTPRoute.

    apiVersion: gateway.networking.k8s.io/v1alpha2
    kind: HTTPRoute
    metadata:
      name: hashicups
    spec:
      parentRefs:
      - name: api-gateway
      rules:
      - matches:
        - path:
            type: PathPrefix
            value: /
        backendRefs:
        - kind: Service
          name: nginx
          namespace: default
          port: 80

    The gateway throws an error and restarts:

    $ kubectl get pods
    NAME                                             READY   STATUS    RESTARTS       AGE
    api-gateway-5d5dd555b5-9kxqh                     1/1     Running   0              10m
    consul-api-gateway-controller-6489bfb4dc-rn8rw   1/2     Error     20 (19s ago)   87m

Logs

Logs
2022-06-03T16:39:26.260Z [INFO]  manager/internal.go:383: consul-api-gateway-server.controller-runtime: starting metrics server: path=/metrics
2022-06-03T16:39:26.260Z [TRACE] envoy/secrets.go:300: consul-api-gateway-server.sds-server.secret-manager: running secrets manager
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0xafb0b1]

goroutine 324 [running]:
github.com/hashicorp/consul-api-gateway/internal/envoy.verifySPIFFE({0x1c56bf8, 0xc0007bbe60}, {0x1c987b0, 0xc00015b280}, 0x0, {0x1c3a9b8, 0xc00080a3c0})
        /home/runner/work/consul-api-gateway/consul-api-gateway/internal/envoy/middleware.go:84 +0x1d1
github.com/hashicorp/consul-api-gateway/internal/envoy.SPIFFEStreamMiddleware.func1({0x198c780, 0xc0004fa090}, {0x1c732b0, 0xc00068ec00}, 0x167aec0, 0x1abe690)
        /home/runner/work/consul-api-gateway/consul-api-gateway/internal/envoy/middleware.go:68 +0xc5
google.golang.org/grpc.(*Server).processStreamingRPC(0xc0003cd6c0, {0x1c85848, 0xc000017500}, 0xc0000c9b00, 0xc0004fa120, 0x2ae5940, 0x0)
        /home/runner/go/pkg/mod/google.golang.org/grpc@v1.40.0/server.go:1557 +0xe9a
google.golang.org/grpc.(*Server).handleStream(0xc0003cd6c0, {0x1c85848, 0xc000017500}, 0xc0000c9b00, 0x0)
        /home/runner/go/pkg/mod/google.golang.org/grpc@v1.40.0/server.go:1630 +0x9e5
google.golang.org/grpc.(*Server).serveStreams.func1.2()
        /home/runner/go/pkg/mod/google.golang.org/grpc@v1.40.0/server.go:941 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
        /home/runner/go/pkg/mod/google.golang.org/grpc@v1.40.0/server.go:939 +0x294

Expected behavior

I expected to have the HTTPRoute add an upstream to my service and be able to access the service over HTTPS.

Environment details

  • consul-api-gateway version: v0.2.1
  • Kubernetes version: v1.22.9-eks-a64ea69
  • Consul Server version: v1.12.0
  • Consul-K8s version: v0.44.0
  • Cloud Provider (If self-hosted, the Kubernetes provider utilized): EKS, AKS, GKE, OpenShift (and version), Rancher (and version), TKGI (and version): EKS v1.22.9

Additional Context

You can find the full deployment (including Vault PKI secrets engine setup and certificate generation) at joatmon08/hashicorp-stack-demoapp.

So, this is likely coming from an issue with our server-side mTLS verification for SDS. Besides using our root cert for crypto verification, we use the root and leaf cert SPIFFE urls to verify the identity of a known gateway as well as ensure that it has the ability to request certain certificates. This identity verification happens after the cryptographic verification of the leaf certs using the requested root cert.

Included in the ID check is this bit:

if uri.Host != spiffeCA.Host {
logger.Warn("found mismatching spiffe hosts, skipping", "caHost", spiffeCA.Host, "clientHost", uri.Host)
continue
}

where spiffeCA comes from the connect root cert from Consul (in this case, backed by Vault's PKI infrastructure). It seems like the root CA for connect in this case has no SPIFFE identifier (probably due to the particularities of the Vault setup), though Consul generally has one when using its default PKI setup. I think this means that we can't always assume that the root actually has such an identifier.

My suggestion is that we consider just dropping this particular check and ignore the "host" part of the SPIFFE url in the client cert. We'd still use the rest of the SPIFFE path for identifying the namespace/name of the deployed gateway and aligning it with our gateway configuration, but we should be able to ignore the need for a root CA SPIFFE component and only leverage the CA for cryptographic verification. So, TLDR; just remove the above lines and I think we should be good.