Timeout not working when having packet losses

Question

Timeout not working when having packet losses

johan-bjareholt opened this issue 7 months ago · comments

I have a case where the timeout is not respected.

First, the examples

server

#!/usr/bin/env python3

from flask import request, Flask
from time import sleep

app = Flask(__name__)

@app.route('/get', methods=['GET'])
def get():
    #sleep(60)
    return "GET OK"

if __name__ == "__main__":
    app.run(host="0.0.0.0", port="1234")

client

use std::time::{Duration, Instant};

fn main() -> Result<(), ureq::Error> {
    let start = Instant::now();
    let resp = ureq::get("http://192.168.0.1:1234/get")
        .timeout(Duration::from_secs(5))
        .call();
    println!("request took: {}s", start.elapsed().as_secs());
    match resp {
        Ok(body) => {
            let body = body.into_string()?;
            println!("body: {}", body);
        }
        Err(err) => {
            println!("request failed: {}", err);
        }
    }
    Ok(())
}

Just running it like this works fine, we get the following output:

request took: 0s
body: GET OK

However, when adding insane synthetic packetloss on the server side like this, we get the following

$ sudo tc qdisc replace dev eth0 root netem loss 90%

request took: 30s
request failed: http://192.168.0.1:1234/get: Connection Failed: Connect error: connection timed out

Considering that the client has "timeout" set to 5s and the documentation for timeout states "Sets overall timeout for the request", I assume that this is a bug?

If i just uncomment the "sleep" in the server code without packet loss however, we do get the correct timeout of 5 seconds. So it seems like it only occurs when the network itself is slow, but not when the server is slow at handling the request.

request took: 5s
request failed: http://127.0.0.1:1234/get: Network Error: Network Error: Error encountered in the status line: timed out reading response

Johan Bjäreholt · Answer 1 · Thu Jan 04 2024 21:04:43 GMT+0800 (China Standard Time)

Okay, so the "issue" is that there is a default set to 30s for timeout_connect on the Agent, which is applied for the Request as well.
When you look at the documentation for the AgentBuilder, it is very clear exactly how this works

This takes precedence over .timeout_read() and .timeout_write(), but not .timeout_connect().

I'd consider this just to be hard to read docs, it's not really a bug. It was my fault for not reading the last part of the sentence in the documentation that i cited before

Sets overall timeout for the request, overriding agent’s configuration if any.