temporalio / temporal

Temporal service

Home Page:https://docs.temporal.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Unable to use dns:// prefix for clusterInformation.rpcAddress configuration

hferentschik opened this issue · comments

Expected Behavior

We are trying to setup multi-cluster replication and for that purpose need to configure the clusterMetadata of the Temporal configuration. This is described here. Specific to the rpcAddress the docs says:

rpcAddress - indicate the remote service address (host:port). Host can be DNS name. Use dns:/// prefix to enable round-robin between IP address for DNS name.

We are using this prefix everywhere where we create a gRPC client. In particular our workers connect via dns:///<service-address>:7233. The reason to explicitly specify the dns resolver is that grpc-go deviates from the gRPC spec and does not use the dns resolver as default. See grpc/grpc-go#3544 (comment). It uses a so called "passthrough" as default. If one wants to use dns, one needs to explicitly specify the resolver using dns:///.

It seems in most cases Temporal allows the use of the prefix, except for clusterInformation.rpcAddress.

Actual Behavior

When configuring the clusterInformation.rpcAddress with something like dns:///<host>:7233 an error is returned when one tried to use cluster replication commands, for example temporal operator cluster upsert in order to establish a connection between two clusters. The returned error is:

address dns:///myhost:7233: too many colons in address

See also https://go.dev/play/p/9Rwow_gtcCF

The problem is that RPCFactory#CreateRemoteFrontendGRPCConnection uses net.SplitHostPort to extract the hostname from the configured rpcAddress in order to lookup the client config for the host.
net.SplitHostPort cannot handle the dns:/// prefix.

func (d *RPCFactory) CreateRemoteFrontendGRPCConnection(rpcAddress string) *grpc.ClientConn {
	var tlsClientConfig *tls.Config
	var err error
	if d.tlsFactory != nil {
		hostname, _, err2 := net.SplitHostPort(rpcAddress)
		if err2 != nil {
			d.logger.Fatal("Invalid rpcAddress for remote cluster", tag.Error(err2))
		}
		tlsClientConfig, err = d.tlsFactory.GetRemoteClusterClientConfig(hostname)

		if err != nil {
			d.logger.Fatal("Failed to create tls config for gRPC connection", tag.Error(err))
			return nil
		}
	}

	return d.dial(rpcAddress, tlsClientConfig)
}

Steps to Reproduce the Problem

  1. Use the dns:/// prefix in the clusterInformation.rpcAddress of the Temporal server in order to enforce the dns resolver in grpc-go
  2. Use an operation which calls CreateRemoteFrontendGRPCConnection, eg try enabling a cluster connection between two clusters.

Specifications

  • Version: v1.22.7

Something like this https://go.dev/play/p/403gj5Zovy7 could work (assuming the port is mandatory):

package main

import (
	"fmt"
	"net"
	"net/url"
	"strings"
)

func main() {
	host, err := hostname("myhost:7233")
	fmt.Printf("host: %s, error: %v\n", host, err)

	host, err = hostname("dns:///myhost:7233")
	fmt.Printf("host: %s, error: %v\n", host, err)
}

func hostname(rpcAddress string) (string, error) {
	hostname, _, err := net.SplitHostPort(rpcAddress)
	if err != nil {
		grpcUrl, parseErr := url.Parse(rpcAddress)
		if parseErr != nil {
			return "", parseErr
		}

		// Check the host field, if set rpcAddress is in plain host:port format
		// If the host is not set the address specified a resolver, eg dns:///host:port.
		// In this case the address is in the path of the URL.
		// See https://github.com/grpc/grpc-go/blob/c8083227ee3181db60cf37005940e11865a572b4/resolver/resolver.go#L272-L288
		address := grpcUrl.Host
		if address == "" {
			address = grpcUrl.Path
		}
		if address == "" {
			address = grpcUrl.Opaque
		}

		address = strings.TrimPrefix(address, "/")
		hostname, _, _ = net.SplitHostPort(address)
		err = nil

	}
	return hostname, err
}

Alternatively one could just use strings.CutPrefix to remove dns:///, but then that would only cover the single dns resolver.

Happy to create/work on a PR if we can agree on the approach. Or maybe I am still missing something here?

@yux0 wdyt? Did you have a chance to look at this?