Parallel requests can swap responses between each other:
fxposter opened this issue · comments
I am not sure if this issue should be opened here (and hence only be related to connection management/etc) or to async gem itself.
Imagine that we have a very simple server like this:
# server.rb
require 'async'
require 'async/http/server'
require 'async/http/endpoint'
require 'async/http/protocol/response'
endpoint = Async::HTTP::Endpoint.parse('http://127.0.0.1:8080')
app = lambda do |request|
Protocol::HTTP::Response[200, {}, [request.path[1..-1]]]
end
server = Async::HTTP::Server.new(app, endpoint)
Async do |task|
server.run
end
IE: when you get http://127.0.0.1:8080/hello/world
you get hello/world
back.
Now let's start 20 tasks, 10 of which will request some url in a loop and other 10 some other url of this server. Then we are going to stop all tasks:
# client.rb
require 'async'
require 'async/http/internet'
Async.logger.fatal!
Async { |task|
internet = Async::HTTP::Internet.new
tasks = []
100.times do
tasks << task.async {
loop do
response = internet.get('http://127.0.0.1:8080/something/special')
r = response.body.join
if r.include?('nothing')
p ['something', r]
end
end
}
end
100.times do
tasks << task.async {
loop do
response = internet.get('http://127.0.0.1:8080/nothing/to/worry')
r = response.body.join
if r.include?('something')
p ['nothing', r]
end
end
}
end
tasks.each do |t|
task.sleep 0.1
t.stop
end
}
If I run server.rb first and then try to run client.rb multiple times I can get this output:
$ ruby client.rb
["nothing", "something/special"]
["something", "nothing/to/worry"]
["nothing", "something/special"]
["something", "nothing/to/worry"]
["nothing", "something/special"]
["something", "nothing/to/worry"]
["nothing", "something/special"]
["something", "nothing/to/worry"]
["nothing", "something/special"]
IE: requests that asked for /something/special
got response nothing/to/worry
and vice versa. I am not sure if it can happen without stopping tasks, but it looks dangerous and it could be reproduced in actual applications in production. The bug (if it is the bug) is here at least since May 2020 (this is when I first encounter it, abandoned my async usage, but now wondered if it was discovered and seems that it was not).
Another thing that bothers me a bit is if we change the client to:
# client2.rb
require 'async'
require 'async/http/internet'
Async.logger.fatal!
Async { |task|
internet = Async::HTTP::Internet.new
tasks = []
10.times do
tasks << task.async {
begin
loop do
internet.get('http://127.0.0.1:8080/something/special').body.join
end
ensure
puts 'something'
end
}
end
10.times do
tasks << task.async {
begin
loop do
internet.get('http://127.0.0.1:8080/nothing/to/worry').body.join
end
ensure
puts 'nothing'
end
}
end
tasks.each do |t|
task.sleep 1
t.stop
end
}
I get this output:
$ ruby client2.rb
something
nothing
something
something
something
nothing
nothing
something
nothing
nothing
nothing
something
nothing
something
something
nothing
something
nothing
something
nothing
And I would expect that first tasks that "something" tasks should be stopped and only then the "nothing" ones, but it seems the output here is completely random and differs each run.
Thanks for your detailed report. I will investigate.
I believe I understand the problem.
We are returning connections to the connection pool which have an outstanding request.
This only impacts the client side as far as I can tell.
The solution should be to be more careful about when connections are returned to the connection pool in the case of an error.
Okay, I believe there are two problems.
Firstly, your sample code is not 100% correct. It's essential that you have something like:
begin
response = internet.get
ensure
response.close
end
The 2nd part of the issue is the underlying HTTP/1 connection can fail when partly writing a request or reading a response first line. In that case, the state is invalid and should not be reused.
Firstly, your sample code is not 100% correct. It's essential that you have something like:
Gotcha, tnx! Though it behaves the same both with or without closing the response.
Okay, a fix for this was released in v0.56.3.
@ioquatix thanks! And what about "client2.rb" - why stop order of tasks are not the ones that is expected.
Let me take a look at that example, I missed it on the first pass.
The reason why it happens is because stopping one task also stops the internal connection pool. This issue is being tracked here: socketry/async#115
Essentially, the first task to do a request creates a child task attached to the host. Because it's a child task, calling stop on the task performing the request causes it to stop the connection pool and clean up which cancels all other requests. This is a bug which we should be able to address pretty easily.
Okay, I've added a failing spec: 809a307
This will be addressed by socketry/async#116.
I believe this issue is fully addressed, however I'm almost certain the connection pool has some edge cases which will require more attention. One such issue was solved in socketry/async-pool@8c6ef4d
I believe we need to introduce a more robust model for exception handling, but I'm still considering my options.