Connection timeout specification.

Question

Connection timeout specification.

souphan-adsk opened this issue 2 years ago · comments

Nicholas Souphandavong commented 2 years ago

The README specifies an example of handling timeout using #with_timeout.

This works well in cases where a response is expected in a certain amount of time, but I'd like to fail much earlier if a connection attempt was not done in a timely manner. Other libraries such as Faraday and Net::HTTP have this value configurable as open_timeout.

Is there any way Async:HTTP (or anything it's built on) can handle this kind of situation?

Samuel Williams · Answer 1 · Thu Mar 24 2022 11:46:13 GMT+0800 (China Standard Time)

How do you imagine this working with connection persistence?

Samuel Williams · Answer 2 · Thu Mar 24 2022 14:25:14 GMT+0800 (China Standard Time)

A couple of additional points:

The request doesn't read the response body, you need to do that separately, so if you are concerned about that, it should be independent timeout.
You can specify a timeout to the endpoint but it's for all operations like connect, read etc.

I'd be interested to know more about whether your use case requires a separate timeout for connect vs read, and in what cases it matters.

Nicholas Souphandavong · Answer 3 · Fri Mar 25 2022 04:38:12 GMT+0800 (China Standard Time)

Thank you for your comments.
We are currently managing a request dispatch queue system that uses multi-user-inputted data for URLs. Typical issues with such system is that a particular endpoint may not reachable or may be very slow to respond. When this happens, naturally the queue is affected negatively.

With a configurable connection timeout, we're looking to handle cases where we could fail/reject early (i.e.: allow 2 second to connect, but 30 seconds to respond) when the inputted endpoint is unreachable due to firewall, or because it's a non-configurable IP address (10.255.255.1), etc.

As for the comment about connection persistence, I'll admit that our scope does not include that usage. But in my naive view, maybe that value would be used only for the initial establishing connection to the host (and not for idleness).

Samuel Williams · Answer 4 · Fri Mar 25 2022 10:31:47 GMT+0800 (China Standard Time)

Async::HTTP::Client and Async::HTTP::Internet are both internally connection pools. For maximum performance you definitely want to use connection pooling. However, if you are talking to many thousands of hosts, you will want to be more restricted on how your pooling works.

endpoint = Async::HTTP::Endpoint.parse('https://example.com', timeout: 2)
client = Async::HTTP::Client.new(endpoint)

task.with_timeout(30) do
  response = client.get("...") # 2 second timeout for open and read.
  body = response.read 
ensure
  client.close
end

We could look at introducing a specific operation which pre-seeds the client with an open connection, e.g.

task.with_timeout(2) do
  client.open # open a single connection to the remote server and return it to the pool (active) to be used later.
end

If you think this solves your problem, we can do that. You'd then configure the timeout to be 30 seconds, and use with_timeout(2) around the initial connection.

Nicholas Souphandavong · Answer 5 · Fri Mar 25 2022 23:26:56 GMT+0800 (China Standard Time)

That is indeed how we're looking to use the library.
If that feature would be introduced as you describe it, it would solve our problem.

Samuel Williams · Answer 6 · Mon Apr 11 2022 18:42:45 GMT+0800 (China Standard Time)

After reviewing the code, you can do the following right now:

task.with_timeout(2) do
  client.pool.acquire{}
end

This will make a single connection and warm up the pool.

I've also proposed #90 which provides client.open which does exactly the same as the above.

Not sure if this interface is worth the overhead. WDYT?

Nicholas Souphandavong · Answer 7 · Tue Apr 12 2022 07:02:02 GMT+0800 (China Standard Time)

I can confirm that the provided lines up there does the trick pretty well.
As for the interface, I personally don't mind either one, though I'll admit that I'm more familiar with the term open of the domain.
In any case, thank you for the assistance on this matter.

Samuel Williams · Answer 8 · Tue Apr 12 2022 07:09:00 GMT+0800 (China Standard Time)

Exposing the underlying connection pool either way seems like a leaky abstraction. I don't think HTTP clients as an interface should be manipulating underlying connections - might not even be connection based in some test scenario or maybe you are replacing HTTP with something entirely different and trying to preserve the same interface.

I'll think about whether merging this PR makes sense, but there is a solution which is talking directly to the connection pool and it's a valid solution which won't break in the future.

Samuel Williams · Answer 9 · Wed May 04 2022 07:39:24 GMT+0800 (China Standard Time)

I think the direction we are going to go in here is exposing more of the connection pool details. The use case here is best served by using the connection pool directly simply because there are many different permutations of this use case and I don't think we want to try and build another layer on top.