Connection hangs/does not timeout after 350 seconds.
mittalsuraj18 opened this issue · comments
Discussed in #2968
Originally posted by mittalsuraj18 December 1, 2023
When an application is running on an AWS VPC and using NAT gateway to communicate, there is a specific behavior in NAT gateway which is described here https://docs.aws.amazon.com/vpc/latest/userguide/nat-gateway-troubleshooting.html as such
Internet connection drops after 350 seconds
Problem
Your instances can access the internet, but the connection drops after 350 seconds.
Cause
If a connection that's using a NAT gateway is idle for 350 seconds or more, the connection times out.
When a connection times out, a NAT gateway returns an RST packet to any resources behind the NAT gateway that attempt to continue the connection (it does not send a FIN packet).
Solution
To prevent the connection from being dropped, you can initiate more traffic over the connection. Alternatively, you can enable TCP keepalive on the instance with a value less than 350 seconds.
For example, if i have a single connection. and do something like,
client = AsyncClient(base_url=BASE_URL, http2=True)
_ = await client.get("/", timeout=1200)
The connection does not respond to RST packet. And times out after 1200 seconds instead of 350 seconds.
The behavior should have been either of the following
- Timeout at 350 seconds
- Periodically send keepalive packet to make sure NAT gateway does not timeout.
How i currently have solved it is by doing a periodic call every 60 seconds, which just calls a healtcheck endpoint of the same baseurl, to use the same connection.
Closing in favor of the discussion, we can escalate if neccessary.