microsoft / service-fabric-issues

This repo is for the reporting of issues found with Azure Service Fabric.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ICommunicationClientFactory causing sticky connections?

chandimar001 opened this issue · comments

We have multiple services in the same cluster communicating with each other using ICommunicationClientFactory pattern described here:
https://docs.microsoft.com/en-us/azure/service-fabric/service-fabric-reliable-services-communication

Issue:

  • Both ServiceA and ServiceB have 5 VMs each
  • ServiceA has a public IP connected to a load balancer
  • Requests come in to ServiceA through the load balancer and is equally distributed among the 5 VMs
  • ServiceA then sends the request to the ServiceB machines via the ICommunicationClientFactory pattern mentioned above
  • The problem is that ServiceA sends the request to only 2 of the 5 ServiceB VMs. This is because ServiceA has a 1-to-1 mapping to a ServiceB machine instead of a 1-to-many mapping with all ServiceB machines.

I believe this is because CommunicationClientFactoryBase maintains a cache of service connections and keeps serving the same endpoint (maybe the first?) from cache instead of a random endpoint from the pool of endpoints.

We pass in the ServicePartitionResolver to CommunicationClientFactoryBase and send the request to the endpoint we get back when CreateClientAsync(string endpoint, CancellationToken cancellationToken) is invoked. All the endpoint resolution logic is out of our hands.

protected override Task<HttpCommunicationClient> CreateClientAsync(string endpoint, CancellationToken cancellationToken)
{
    // clients that maintain persistent connections to a service should 
    // create that connection here.
    // an HTTP client doesn't maintain a persistent connection.
    return Task.FromResult(new HttpCommunicationClient(this.httpClientFactory.CreateClient(), endpoint));
}

Disconnecting our code from the ICommunicationClientFactory pattern and resolving the endpoint ourselves as follows fixes the problem.

var resolver = ServicePartitionResolver.GetDefault();
var resolvedServicePartition = await resolver.ResolveAsync(this.url, ServicePartitionKey.Singleton, CancellationToken.None);    
var endpoint = resolvedServicePartition.GetEndpoint();
JObject endpointObject = JObject.Parse(endpoint.Address);
string endpointAddress = (string)endpointObject["Endpoints"].First();

Is this a known issue or am I using the ICommunicationClientFactory pattern incorrectly?

I've spent more time on this and this is a scope/lifetime issue and not a bug.

The http client that ServiceA uses to communicate with ServiceB is a singleton.

services.AddSingleton<IServicePartitionResolver>(ServicePartitionResolver.GetDefault());
services.AddSingleton<ICommunicationClientFactory<HttpCommunicationClient>, HttpCommunicationClientFactory>();
services.AddSingleton<IServiceBClient, ServiceBClient>();

ServiceBClient inherits from ServicePartitionClient
public class ServiceBClient: ServicePartitionClient<HttpCommunicationClient>

HttpCommunicationClientFactory inherits from CommunicationClientFactoryBase
public class HttpCommunicationClientFactory : CommunicationClientFactoryBase<HttpCommunicationClient>

We implement/override CreateClientAsync() by creating a client that is tied to the endpoint we get from CommunicationClientFactoryBase.

protected override Task<HttpCommunicationClient> CreateClientAsync(string endpoint, CancellationToken cancellationToken)
{
    // clients that maintain persistent connections to a service should 
    // create that connection here.
    // an HTTP client doesn't maintain a persistent connection.
    return Task.FromResult(new HttpCommunicationClient(this.httpClientFactory.CreateClient(this.GetType().Name), endpoint, this.telemetryContextProvider, this.loggerFactory));
}

This explains why our http client always calls the same endpoint. Because it is a singleton. The comments for CreateClientAsync() basically says so :)

So, what is the proper way of integrating with the ICommunicationClientFactory pattern? Are we expected to create a new http client for each query? I naively assumed that the endpoint parameter that CreateClientAsync() provides was a URL to the reverse proxy as opposed to an individual IP address.

This issue has been resolved. It was user error all along. ServiceBClient was inheriting from ServicePartitionClient and was being defined as a singleton, so the resolver code ran only once during startup. I updated the code to create a new ServicePartitionClient per request and it fixed the problem.