Parallel requests can swap responses between each other:

Question

Parallel requests can swap responses between each other:

fxposter opened this issue 3 years ago · comments

I am not sure if this issue should be opened here (and hence only be related to connection management/etc) or to async gem itself.

Imagine that we have a very simple server like this:

# server.rb
require 'async'
require 'async/http/server'
require 'async/http/endpoint'
require 'async/http/protocol/response'

endpoint = Async::HTTP::Endpoint.parse('http://127.0.0.1:8080')

app = lambda do |request|
  Protocol::HTTP::Response[200, {}, [request.path[1..-1]]]
end

server = Async::HTTP::Server.new(app, endpoint)

Async do |task|
  server.run
end

IE: when you get http://127.0.0.1:8080/hello/world you get hello/world back.

Now let's start 20 tasks, 10 of which will request some url in a loop and other 10 some other url of this server. Then we are going to stop all tasks:

# client.rb
require 'async'
require 'async/http/internet'
Async.logger.fatal!

Async { |task|
  internet = Async::HTTP::Internet.new
  tasks = []
  100.times do
    tasks << task.async {
      loop do
        response = internet.get('http://127.0.0.1:8080/something/special')
        r = response.body.join
        if r.include?('nothing')
          p ['something', r]
        end
      end
    }
  end
  100.times do
    tasks << task.async {
      loop do
        response = internet.get('http://127.0.0.1:8080/nothing/to/worry')
        r = response.body.join
        if r.include?('something')
          p ['nothing', r]
        end
      end
    }
  end

  tasks.each do |t|
    task.sleep 0.1
    t.stop
  end
}

If I run server.rb first and then try to run client.rb multiple times I can get this output:

$ ruby client.rb
["nothing", "something/special"]
["something", "nothing/to/worry"]
["nothing", "something/special"]
["something", "nothing/to/worry"]
["nothing", "something/special"]
["something", "nothing/to/worry"]
["nothing", "something/special"]
["something", "nothing/to/worry"]
["nothing", "something/special"]

IE: requests that asked for /something/special got response nothing/to/worry and vice versa. I am not sure if it can happen without stopping tasks, but it looks dangerous and it could be reproduced in actual applications in production. The bug (if it is the bug) is here at least since May 2020 (this is when I first encounter it, abandoned my async usage, but now wondered if it was discovered and seems that it was not).

Another thing that bothers me a bit is if we change the client to:

# client2.rb
require 'async'
require 'async/http/internet'
Async.logger.fatal!

Async { |task|
  internet = Async::HTTP::Internet.new
  tasks = []
  10.times do
    tasks << task.async {
      begin
        loop do
          internet.get('http://127.0.0.1:8080/something/special').body.join
        end
      ensure
        puts 'something'
      end
    }
  end
  10.times do
    tasks << task.async {
      begin
        loop do
          internet.get('http://127.0.0.1:8080/nothing/to/worry').body.join
        end
      ensure
        puts 'nothing'
      end
    }
  end

  tasks.each do |t|
    task.sleep 1
    t.stop
  end
}

I get this output:

$ ruby client2.rb
something
nothing
something
something
something
nothing
nothing
something
nothing
nothing
nothing
something
nothing
something
something
nothing
something
nothing
something
nothing

And I would expect that first tasks that "something" tasks should be stopped and only then the "nothing" ones, but it seems the output here is completely random and differs each run.

Samuel Williams · Answer 1 · Fri Jun 04 2021 04:36:57 GMT+0800 (China Standard Time)

Thanks for your detailed report. I will investigate.

Samuel Williams · Answer 2 · Fri Jun 04 2021 14:52:38 GMT+0800 (China Standard Time)

I believe I understand the problem.

We are returning connections to the connection pool which have an outstanding request.

This only impacts the client side as far as I can tell.

The solution should be to be more careful about when connections are returned to the connection pool in the case of an error.

Samuel Williams · Answer 3 · Fri Jun 04 2021 16:48:10 GMT+0800 (China Standard Time)

Okay, I believe there are two problems.

Firstly, your sample code is not 100% correct. It's essential that you have something like:

begin
  response = internet.get
ensure
  response.close
end

The 2nd part of the issue is the underlying HTTP/1 connection can fail when partly writing a request or reading a response first line. In that case, the state is invalid and should not be reused.

Pavlo · Answer 4 · Fri Jun 04 2021 18:47:23 GMT+0800 (China Standard Time)

Firstly, your sample code is not 100% correct. It's essential that you have something like:

Gotcha, tnx! Though it behaves the same both with or without closing the response.

Samuel Williams · Answer 5 · Sat Jun 05 2021 07:43:55 GMT+0800 (China Standard Time)

Okay, a fix for this was released in v0.56.3.

Pavlo · Answer 6 · Sat Jun 05 2021 17:37:43 GMT+0800 (China Standard Time)

@ioquatix thanks! And what about "client2.rb" - why stop order of tasks are not the ones that is expected.

Samuel Williams · Answer 7 · Sat Jun 05 2021 18:33:41 GMT+0800 (China Standard Time)

Let me take a look at that example, I missed it on the first pass.

Samuel Williams · Answer 8 · Sat Jun 05 2021 18:41:35 GMT+0800 (China Standard Time)

The reason why it happens is because stopping one task also stops the internal connection pool. This issue is being tracked here: socketry/async#115

Essentially, the first task to do a request creates a child task attached to the host. Because it's a child task, calling stop on the task performing the request causes it to stop the connection pool and clean up which cancels all other requests. This is a bug which we should be able to address pretty easily.

Samuel Williams · Answer 9 · Sun Jun 06 2021 09:53:56 GMT+0800 (China Standard Time)

Okay, I've added a failing spec: 809a307

This will be addressed by socketry/async#116.

Samuel Williams · Answer 10 · Mon Jun 07 2021 08:05:15 GMT+0800 (China Standard Time)

I believe this issue is fully addressed, however I'm almost certain the connection pool has some edge cases which will require more attention. One such issue was solved in socketry/async-pool@8c6ef4d

I believe we need to introduce a more robust model for exception handling, but I'm still considering my options.