socketry / async-http

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Implementing an idle connection timeout.

kaorihinata opened this issue · comments

I've run into a somewhat unique situation where an external event causes the routing to change (I disable a VPN), but async-http can't detect this. I assume there's no connection resets, nor reset of state (because I believe on macOS everything feeds into a bridge interface anyway) so a percentage of the workers performing the download task I have running go dormant waiting for data that will never arrive. Essentially, I want to implement an idle connection timeout (as I assume the reads are non-blocking anyway), and a task timeout would be too heavy handed (and places the guess work at the wrong place.)

I've checked the reference documentation, and scanned the source, but I don't see a mechanism like this in place at this time, but I've also admittedly not taken a very close look. Is this something that would be appropriate to add to async-http? Or perhaps one of the other projects under the socketry umbrella? Or is there a more obvious way to solve this problem that I'm not seeing?

To give a little background, I'm using an Internet and a Barrier + Semaphore to download a large number of archives from a vendor. Each archive can vary in size, but there is a large quantity of them, and my connection to S3 is not the best. When I disable my VPN a little over half of the workers become stuck. I believe an idle timeout detection resulting in an exception would sort this right out, but the only support for timeouts I found were in the project examples, and a task timeout would be making assumptions about the lifetime of the entire transfer, which I don't believe would be the correct path forward with this.

I can give an example, but I would essentially be giving the "download multiple files" example as what I'm asking for is less of a bug, and more an enhancement.

We are going to revisit async-pool to potentially support this use case more effectively and to reduce resource utilisation.

Want me to close this and move it over there? I'm not in a hurry, and I can work around it for the time being (and create a pull request if I think of something that is more than just a kludge), but I noticed that you sometimes create issues yourself, so I can create one for this if you'd like. I can give an example as well so that it's actually useful for memory jogging.

I think let's leave it open for now.

After re-reading this issue, I believe using the timeout: option to Async::HTTP::Endpoint should be sufficient.

It's been a while since I messed with this particular program (the vendor we were working with was terrible), but the issue was with using Async::HTTP::Internet specifically. I believe what I wanted to know at the time was, "How do I create a timeout with Async::HTTP::Internet as it doesn't seems like I can?" Fast forward a few months, and I think that, at least in my case, I should have been using Client/Endpoint or Async::HTTP::Internet.client_for/Endpoint from the get go, but just for the sake of the question, is there a way to easily implement a global timeout?

Edit: I mention a global timeout specifically because targeted timeouts are something you really should be using Client/Endpoint for, I assume.

I think it might be reasonable for Async::HTTP::Internet to have some default timeout e.g. 2 minutes between any IO activity otherwise timeout. There is no general top level timeout for all blocking operations, however we could consider introducing one. There is some discussion here: ruby/ruby#5653

You sure had to do a lot of explaining on that issue. If you think fixing it upstream is the ultimate solution then I can close this if you'd like. I've tested the timeout: option against all of the scenarios I could think of and it should do the trick now that I know it's something I'll need to do.

The upstream timeout issue is more of a standard Ruby interface, however the endpoint timeout option will always remain and work, it's just that internally it might eventually be redirected to a standard attribute on the IO object. Regarding your specific point, there is a chance we could introduce a global timeout, e.g. IO.timeout = 20 for ALL IO operations, that's in the linked discussion (kind of).