rabbitmq / rabbitmq-dotnet-client

RabbitMQ .NET client for .NET Standard 2.0+ and .NET 4.6.2+

Home Page:https://www.rabbitmq.com/dotnet.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IConnection.CreateModel() is significantly slower in 6.1.0.

makercob opened this issue · comments

Updated this client from 6.0.0 to 6.1.0, and noticed that IConnection.CreateModel() becomes significantly slower.

Environments and preconditions:

  • .NET Core 3.1 (on Windows)
  • RabbitMQ 3.8.4
  • ConnectionFactory.DispatchConsumersAsync is set to true
  • WAN connection

Symptoms:
When creating IModel, the initial 2 or 3 invoking are pretty quick, each taking about 40ms (WAN connection), however succeeding attempts become much slower, taking up to 1,000ms for each.

And also, putting lock(_lockObject) { ... } around IConnection.CreateModel() just makes it even slower.

Please share a benchmark snippet we can use to reproduce. We try to minimize guessing and implicit assumptions when reasoning about library behavior.
Opening a channel involves a network round trip. There can be any number of things involved.

Acquiring and releasing a lock around an operation will not make it more efficient.

Ideally, a traffic capture should be taken and shared along with an executable way to run the benchmark. The latency of the method in question is highly dependent on the network.

Please share a traffic capture and a complete set of code we can compile and run to observe this behavior. If we find a regression, I will re-open this issue.

Snippet as follows:

using System;
using System.Linq;
using System.Threading.Tasks;
using RabbitMQ.Client;

namespace ModelCreationTest {
    class Program {
        static void Main(string[] args) {
            // connects to RabbitMQ
            var factory = new ConnectionFactory {
                AutomaticRecoveryEnabled = true,
                TopologyRecoveryEnabled = true,
                HostName = "127.0.0.1",
                Port = 5672,
                UserName = "user",
                Password = "password",
                VirtualHost = "/",
                RequestedConnectionTimeout = new TimeSpan(0, 0, 2),
                UseBackgroundThreadsForIO = false,
                DispatchConsumersAsync = true
            };

            var connection = factory.CreateConnection();

            // creates channels
            Console.WriteLine("Creating channels...");
            var totalTicksCounter = Environment.TickCount;

            /*Creating channels synchronously works fine with 6.0.0 and 6.1.0.
             *Creating channels asynchronously is very slow with 6.1.0. */
            var tasks = Enumerable.Range(0, 20).Select(
                i => Task.Factory.StartNew(
                    obj => {
                        var ticksCounter = Environment.TickCount;
                        var channel = connection.CreateModel();
                        var cost = Environment.TickCount - ticksCounter;
                        Console.WriteLine($"ID:{(int)obj}, Cost:{cost:N0}ms.");
                    },
                    i));

            // waits for completion
            Task.WhenAll(tasks).Wait();
            var totalCost = Environment.TickCount - totalTicksCounter;
            Console.WriteLine($"Done, Total Cost:{totalCost:N0}ms.");

            Console.WriteLine("Press any key to exit...");
            Console.ReadKey();
        }
    }
}

Results with 6.0.0 as follows:
Screen Shot 2020-06-10 at 12 59 30

Results with 6.1.0 as follows:
Screen Shot 2020-06-10 at 13 01 52

What is the core count on the machine you are using?

If you add ThreadPool.SetMinThreads(16, 16); to the start of your program, does that change anything?

Re-opening since we received code to try and reproduce.

@stebet

  1. What is the core count on the machine you are using?
    The core count is 4.

  2. If you add ThreadPool.SetMinThreads(16, 16); to the start of your program, does that change anything?
    When ThreadPool.SetMinThreads(16, 16); is added, about 50% performance gain is observed, however still much slower than in 6.0.0.

Screen Shot 2020-06-11 at 09 24 29

@stebet @makercob this version consistently performs well using both 6.0.0 and 6.1.0:

https://github.com/lukebakken/rabbitmq-dotnet-client-860/blob/main/Program.cs

Main differences:

  • Thread pool count is higher than the number of channels created
  • LINQ is removed since a for loop suffices. In addition, the removal of Enumerable.Range fixes the issue. There must be an interaction between lazy evaluation of the range and the changes in 6.1.0. We can investigate this, but I don't consider it to be a show-stopper.

Here's another interesting thing I found. If you apply #863 (which removes Task.Yield) then this code runs quickly:

https://github.com/lukebakken/rabbitmq-dotnet-client-860/blob/main/Program.cs

Note that the above adds ThreadPool.SetMinThreads but keeps the LINQ statement.

I hope that @stebet or @danielmarbach can explain this 😂

Try calling .ToArray() on the end of the LINQ statement in the current code and see if that changes anything. It's possible that Task.WhenAll iterates through the loop taking only X tasks at a time, and therefore a new task isn't started until the previous one finishes, where a ToArray will evaluate the entire enumeration before calling Task.WhenAll, hopefully scheduling all the tasks immediately.

@stebet as you suspected calling ToArray addresses the issue as well. You still have to call SetMinThreads, however. I'm assuming we should add guidance to the docs or something now that we're using the thread pool.

I won't close this issue (#860) because it does demonstrate a problem that is resolved by #863.

@makercob please take the time to test the workarounds discussed in the recent comments. Thanks.

@lukebakken I doubt it is a good idea to add guidance around setting min threads. How would you ever come up with good guidance to guide the users to set appropriate values?

You as a library can't really make any assumptions about the nature of threading involved by the code that is calling you. The problem I see here is the fact that even-though we have an IO-bound path we can't really leverage async here because the API surface is sync. Because the API is sync anyone that calls it and wants to do it in parallel has to offload the calls explicitly to the worker thread pool. Explicit offloading to the worker threadpool increases the pressure on the pool and might cause it to ramp up it's capacity (also known as hill climbing). That operation can cause significant stalls depending on the workload. And the library in addition to that yielding internally increases the pressure even more. That's why you see an improvement when setting the min threads because the thread pool will likely already be ready to handle the workload that is demanded in the sample and removing the yield causes less stress.

So removing the yield didn't make it into 6.1?

Thanks for the input.

There were enough caveats in your comment here that I didn't remove the yield call for 6.1.0. It can be done for 6.1.1 however.

@lukebakken Here are my findings, on a 4-core machine:

LINQ is used, not for loop.

with 6.1.0

  • Adding ThreadPool.SetMinThreads(32, 32) did bring some performance gain, with or without .ToArray().

Screen Shot 2020-06-16 at 11 28 40

with 6.0.0

  • Adding ThreadPool.SetMinThreads(32, 32) slows it down about 10x, with or without .ToArray().

With ThreadPool.SetMinThreads(32, 32)
Screen Shot 2020-06-16 at 11 31 31

Without ThreadPool.SetMinThreads(32, 32)
Screen Shot 2020-06-16 at 11 31 52

@makercob please test the changes in #863 if you can. Thanks.

My tests show that this issue can be fixed via the code changes suggested in the comments as well as #863. I will produce 6.1.1-rc.1 today.

@lukebakken Updated to 6.1.1-rc.1, however, the issue still persists.

@makercob this is open source software. You are welcome to profile your workload with both versions and share both results.

@makercob in my environment where I could reproduce the issue, the issue is fixed. "the issue still persists" does not give us actionable information. At the very least, provide an exact copy of the code you are running. Note that calling ThreadPool.SetMinThreads(A, B) is necessary.