Multicluster examples do not work if trustDomain value is set

Question

Multicluster examples do not work if trustDomain value is set

nea1 opened this issue 2 years ago · comments

Bug Description

The Multi-Primary on different networks multicluster example works exactly as expected if using the same trustDomain (e.g. cluster.local) in both clusters. However if the trustDomain value differs across clusters (everything else remaining the same); then cross-cluster calls do not work.

e.g.
cluster 1:

meshConfig:
  trustDomain: search.prod

cluster 2:

meshConfig:
  trustDomain: payments.prod

Both clusters have a common root of trust as per the pre-requisite setup in Configure Trust

We require the use of different trust domains to uniquely identify workloads across the estate; however we also require them to have the same x509 root cert.

Expected Behaviour

The workload certs in both clusters share the same root, so I would expect cross cluster MTLS calls to succeed. i.e.

Calls to the helloworld service should be served by both the local v1 deployment and remote v2 deployment
Calls to a remote httpbin service should be successfully served over MTLS

Actual Behaviour

Calls to the helloworld service are served only from the local cluster:

Hello version: v1, instance: helloworld-v1-7df57fccf6-wl85c
Hello version: v1, instance: helloworld-v1-7df57fccf6-wl85c
Hello version: v1, instance: helloworld-v1-7df57fccf6-wl85c
Hello version: v1, instance: helloworld-v1-7df57fccf6-wl85c
...

Calls to a remote httpbin service fail

kubectl exec -n foo deploy/sleep --context="${CTX_CLUSTER1}" -- curl http://httpbin.bar/get 
upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268435581:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED

I would expect the trustDomain values being different only to come into play if referencing the spiffe ids/principals in an AuthorizationPolicy; which is not the case here (no polices are deployed)

Unless there is some extra config required for multiple trustDomains (with the same root) to work in a multicluster setup, then this looks to be a bug?

Version

istioctl version                                                                            ✔  
client version: 1.13.4
control plane version: 1.13.4
data plane version: 1.13.4 (4 proxies)

Additional Information

No response

John Howard · Answer 1 · Wed Jun 01 2022 02:01:36 GMT+0800 (China Standard Time)

By default only the same trust domain is trusted; to add more trustDomainAliases can be set

Neal Lidster · Answer 2 · Thu Jun 02 2022 00:03:12 GMT+0800 (China Standard Time)

@howardjohn my understanding is that trustDomainAliases effectively says "these trustdomains should be considered the same as this trustdomain"; which is subtlety different to "these trustdomains should be trusted"

For example if I configure the following in the "payments" cluster:

  meshConfig:
    trustDomain: payments.prod
    trustDomainAliases:
    - search.prod

...then a workload with spiffe id spiffe://payments.prod/ns/foo/sa/sleep is considered identical to a workload with spiffe id spiffe://search.prod/ns/foo/sa/sleep. Therefore an AuthorizationPolicy containing:

- from:
  - source:
      principals: ["payments.prod/ns/foo/sa/sleep"]

would allow access from payments.prod/ns/foo/sa/sleep and search.prod/ns/foo/sa/sleep.
I still want to be able to distinguish between workloads with different spiffe ids and for example allow access from the payments instance, but deny access from the search instance - how do I do that?

I would have expected the default behaviour when there is common root cert to be, that in the absence of an AuthorizationPolicy saying otherwise, that communication would be trusted - a user would then be able to deploy an AuthorizationPolicy to deny other trustdomains if desired i.e. something like

  action: DENY
  rules:
  - from:
    - source:
        principals: ["anotherdomain.prod/*"]

John Howard · Answer 3 · Thu Jun 02 2022 00:07:44 GMT+0800 (China Standard Time)

Ah good point... this does seem like a functionality gap.

Neal Lidster · Answer 4 · Tue Jun 14 2022 06:00:56 GMT+0800 (China Standard Time)

Do you know if there is a workaround for this today? I was looking at certificatedata which appears to let you "add" trust domains to the current mesh, but looks like it relies on adding a bundle url (which in this case would be redundant, as it's effectively the same bundle)

skizot722 · Answer 5 · Tue Jul 12 2022 02:29:14 GMT+0800 (China Standard Time)

Hey @nea1 - did you ever find an alternative approach here? I've run into the exact same situation.

Neal Lidster · Answer 6 · Tue Jul 19 2022 18:59:35 GMT+0800 (China Standard Time)

Hey @skizot722 - unfortunately not. I didn't pursue the certificatedata approach I listed above as even if it worked, it would have been difficult to manage for our use case (we have 10s of trust domains across 100+ clusters). Also ultimately it really would have just been a workaround rather than a proper solution. Unfortunately I don't currently have the bandwidth to contribute a fix, so if someone is able to do that then that would be awesome.

Istio Policy Bot · Answer 7 · Thu Sep 15 2022 13:03:01 GMT+0800 (China Standard Time)

🚧 This issue or pull request has been closed due to not having had activity from an Istio team member since 2022-06-01. If you feel this issue or pull request deserves attention, please reopen the issue. Please see this wiki page for more information. Thank you for your contributions.

Created by the issue and PR lifecycle manager.