socketry / async-http

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Connection timeout specification.

souphan-adsk opened this issue · comments

The README specifies an example of handling timeout using #with_timeout.

This works well in cases where a response is expected in a certain amount of time, but I'd like to fail much earlier if a connection attempt was not done in a timely manner. Other libraries such as Faraday and Net::HTTP have this value configurable as open_timeout.

Is there any way Async:HTTP (or anything it's built on) can handle this kind of situation?

How do you imagine this working with connection persistence?

A couple of additional points:

  • The request doesn't read the response body, you need to do that separately, so if you are concerned about that, it should be independent timeout.
  • You can specify a timeout to the endpoint but it's for all operations like connect, read etc.

I'd be interested to know more about whether your use case requires a separate timeout for connect vs read, and in what cases it matters.

Thank you for your comments.
We are currently managing a request dispatch queue system that uses multi-user-inputted data for URLs. Typical issues with such system is that a particular endpoint may not reachable or may be very slow to respond. When this happens, naturally the queue is affected negatively.

With a configurable connection timeout, we're looking to handle cases where we could fail/reject early (i.e.: allow 2 second to connect, but 30 seconds to respond) when the inputted endpoint is unreachable due to firewall, or because it's a non-configurable IP address (10.255.255.1), etc.

As for the comment about connection persistence, I'll admit that our scope does not include that usage. But in my naive view, maybe that value would be used only for the initial establishing connection to the host (and not for idleness).

Async::HTTP::Client and Async::HTTP::Internet are both internally connection pools. For maximum performance you definitely want to use connection pooling. However, if you are talking to many thousands of hosts, you will want to be more restricted on how your pooling works.

endpoint = Async::HTTP::Endpoint.parse('https://example.com', timeout: 2)
client = Async::HTTP::Client.new(endpoint)

task.with_timeout(30) do
  response = client.get("...") # 2 second timeout for open and read.
  body = response.read 
ensure
  client.close
end 

We could look at introducing a specific operation which pre-seeds the client with an open connection, e.g.

task.with_timeout(2) do
  client.open # open a single connection to the remote server and return it to the pool (active) to be used later.
end 

If you think this solves your problem, we can do that. You'd then configure the timeout to be 30 seconds, and use with_timeout(2) around the initial connection.

That is indeed how we're looking to use the library.
If that feature would be introduced as you describe it, it would solve our problem.

After reviewing the code, you can do the following right now:

task.with_timeout(2) do
  client.pool.acquire{}
end

This will make a single connection and warm up the pool.

I've also proposed #90 which provides client.open which does exactly the same as the above.

Not sure if this interface is worth the overhead. WDYT?

I can confirm that the provided lines up there does the trick pretty well.
As for the interface, I personally don't mind either one, though I'll admit that I'm more familiar with the term open of the domain.
In any case, thank you for the assistance on this matter.

Exposing the underlying connection pool either way seems like a leaky abstraction. I don't think HTTP clients as an interface should be manipulating underlying connections - might not even be connection based in some test scenario or maybe you are replacing HTTP with something entirely different and trying to preserve the same interface.

I'll think about whether merging this PR makes sense, but there is a solution which is talking directly to the connection pool and it's a valid solution which won't break in the future.

I think the direction we are going to go in here is exposing more of the connection pool details. The use case here is best served by using the connection pool directly simply because there are many different permutations of this use case and I don't think we want to try and build another layer on top.