temporalio / sdk-dotnet

Temporal .NET SDK

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Trying to start a workflow from asp.net and the client hangs

jakejscott opened this issue · comments

What are you really trying to do?

Trying to start a workflow from asp.net minimal api.

Describe the bug

If I create the client at startup, and then try and use that client instance within an endpoint it hangs when I call StartWorkflowAsync.

If I create a client within the MapGet endpoint lambda it works.

Minimal Reproduction

using Temporalio.Client;
using Temporalio.Runtime;

var builder = WebApplication.CreateBuilder(args);

builder.Logging.AddSimpleConsole(options =>
{
    options.TimestampFormat = "yyyy-mm-ddTHH:mm:ss.ff ";
    options.UseUtcTimestamp = true;
    options.IncludeScopes = false;
    options.SingleLine = true;
});

// If I create the client here, when I call `client.StartWorkflowAsync` the call blocks and doesn't work.
var client = await TemporalClient.ConnectAsync(new()
{
    TargetHost = "localhost:7233",
    Namespace = "default",
});

var app = builder.Build();

app.MapGet("/", async () =>
{
    app.Logger.LogInformation("We are here");
    
    // NOTE: If I connect here it works
    // var client = await TemporalClient.ConnectAsync(new()
    // {
    //     TargetHost = "localhost:7233",
    //     Namespace = "default",
    // });

    var id = $"simple-workflow-{Guid.NewGuid().ToString()}";

    // This call blocks and doesn't work
    var handle = await client.StartWorkflowAsync(
        workflow: SimpleWorkflow.Ref.RunAsync,
        arg: "Jake",
        options: new WorkflowOptions(
            id: id,
            taskQueue: "my-task-queue"
        )
    );

    var result = await handle.GetResultAsync();
    return result;
});

app.Logger.LogInformation("Started");

app.Run();

Environment/Versions

  • x86 Windows
  • using temporal cli

I have replicated. There is something about how the container is working that does is preventing the Tokio thread from running properly.

In the meantime, what you have there is not best practice. And it's not good to connect per call either. Consider changing to:

builder.Services.AddSingleton(_ =>
    Task.Run(() => TemporalClient.ConnectAsync(new()
    {
        TargetHost = "localhost:7233",
        Namespace = "default",
    })).Result);

And:

app.MapGet("/", async (TemporalClient client) =>

This is actually bad practice too, ref https://learn.microsoft.com/en-us/dotnet/core/extensions/dependency-injection-guidelines#async-di-factories-can-cause-deadlocks. I will consider some kind of TemporalClientProvider when I write the sample for this.

If might be help if there were a blocking version of Bridge.Client.ConnectAsync.

Hrmm, I wonder how. Can you explain a bit more? It's non-blocking because the Rust side is intentionally non-blocking. Users can of course execute any .NET task in a blocking manner. But the problem is that something is broken within the client if created on the outside vs the inside in the exact same way.

So I have packaged up the replication. See the attached zip. Steps to replicate:

  1. Have a Temporal serer running
  2. Clone this repository recursively and set TreatWarningsAsErrors as false in Directory.Build.props
  3. Extract attached zip to tests/ (it has a single folder in it called Temporalio.WebFailure)
  4. cd tests/Temporalio.WebFailure
  5. dotnet run
  6. It'll start with some logs, now navigate to http://localhost:5000 in a browser and watch it hang while trying to make start-workflow gRPC call

There is something weird going on with sockets or threads or something, and I cannot figure out what.

To enable trace logs, change the initial client creation to:

var runtime = new TemporalRuntime(new(new() { Logging = new(new("trace")) }));
var client = await TemporalClient.ConnectAsync(new("localhost:7233") { Runtime = runtime });

What I have observed with trace logs is that in the successful case (client created inside handler which works), Connection:poll: logs appear when making the call. But in the failed case (the default code), Connection:poll logs do not appear. Something is inadvertently stopping/killing connection poll and I am not sure what.

Temporalio.WebFailure.zip

Good news. I can no longer replicate after updating core to get temporalio/sdk-core#584 and setting my min tonic version to 0.9 and cargo updateing the bridge project. I will try to dig through Tonic release notes to see what it could be (though it could be one of their transitive dependencies).

I will update when I make a PR to fix this (I will likely wait for temporalio/sdk-core#544 to be merged too).

Wonder if it's worth adding a regression test in case it ever comes back?

I think it might be, yes. I'll try to see if I can replicate via https://learn.microsoft.com/en-us/dotnet/api/microsoft.aspnetcore.mvc.testing.webapplicationfactory-1 in my unit tests without too much trouble.