Inability to configure full request timeout leads to DoS / resource leak

Question

Inability to configure full request timeout leads to DoS / resource leak

Shnatsel opened this issue 5 years ago · comments

Sergey "Shnatsel" Davidoff commented 5 years ago

attohttpc currently does not allow specifying a timeout for the entire request (i.e. until the request body is finished transferring), which means an established connection will hang indefinitely if the remote host doesn't stops replying. This leads to a resource leak, which may cause denial of service. This post explains the problem in detail: https://medium.com/@nate510/don-t-use-go-s-default-http-client-4804cb19f779

While it's not necessary to set such timeout by default (most other HTTP clients don't), it must be possible to configure it for attohttpc to be usable for e.g. building reliable services that query a remiote API over HTTP, or for defending against DoS attacks where the remote host sends 1 byte per second, keeping the connection open indefinitely.

Sergey "Shnatsel" Davidoff · Answer 1 · Wed Jan 08 2020 04:29:44 GMT+0800 (China Standard Time)

I have updated the description to clarify the attack vectors and a better approach to mitigating them.

Adam Reichold · Answer 2 · Wed Jan 08 2020 04:45:49 GMT+0800 (China Standard Time)

A read timeout should be easy to add, but a full request timeout is a complicated thing using a blocking design, i.e. one needs to spawn a separate thread that forcibly closes the underlying TCP stream so that all blocking calls return when the timeout expires.

But also not that the DoS problem is mainly one for the server, i.e. a client will probably not have a problem keeping single TCP connection alive, a server which does that on behalf of a lot of clients certainly can get into trouble. But honestly, I don't think attohttpc is a very reasonable choice for such a situation and one would probably prefer a HTTP client that is integrated with the I/O framework of the server itself which will make it reasonably simple to solve things like full request timeouts using e.g. Futures. Or to quote the documentation:

The intended use is for projects that have HTTP needs where performance is not critical or when HTTP is not the main purpose of the application.

Sergey "Shnatsel" Davidoff · Answer 3 · Wed Jan 08 2020 05:05:16 GMT+0800 (China Standard Time)

I think there is a way to fit it into a blocking design: you can simply configure the read timeout to min(read_timeout, time_until_request_deadline)

Sergey "Shnatsel" Davidoff · Answer 4 · Wed Jan 08 2020 05:06:56 GMT+0800 (China Standard Time)

The drawback is that this timeout needs to be reconfigured once in a while, but if it's done e.g. once every 250 milliseconds it's not going to interfere with performance and still provide adequate precision.

Adam Reichold · Answer 5 · Wed Jan 08 2020 05:27:55 GMT+0800 (China Standard Time)

I think there is a way to fit it into a blocking design: you can simply configure the read timeout to min(read_timeout, time_until_request_deadline)

While such a scheme can be made to work in principle, it is quite brittle as it needs to be threaded through the layers of blocking reads hence complicated code significantly which is why I think that spawning a separate thread would more in line with this crate's goals for simplicity but actually I think the problem is probably not really relevant for this crate's intended use case.

Simon Bernier St-Pierre · Answer 6 · Wed Jan 08 2020 08:02:53 GMT+0800 (China Standard Time)

The Golang bug described in the blog post has been fixed since, but I actually ran into another variant of it at work and it's actually a pretty nasty bug.

Since this library does not do connection pooling in the background, it's not as dangerous as the users are likely to figure out that their code is blocked on something. But if you use it in a thread pool in a crawler type of application it could easily exhaust the whole thread pool quickly.

I think the min trick can be implemented somewhat easily if we do it in the BaseStream struct since every call to read will eventually end up there, though the various layers of buffering might obscure this a bit.

Adam Reichold · Answer 7 · Wed Jan 08 2020 21:22:48 GMT+0800 (China Standard Time)

I think the min trick can be implemented somewhat easily if we do it in the BaseStream struct since every call to read will eventually end up there, though the various layers of buffering might obscure this a bit.

I opened #39 implementing a request timeout (actually a deadline) by setting read and write timeouts before every blocking call into the base stream. Personally, we should do only one thing, either the request timeout or the read timeout as the read timeout is not really necessary if the request timeout is used and the request timeout is costly enough as it is as we have to apply this each time we have to make a blocking call.

Simon Bernier St-Pierre · Answer 8 · Wed Jan 08 2020 22:23:04 GMT+0800 (China Standard Time)

We already use buffering to avoid performing too many syscalls, so while this will probably hurt performance, I don't think it will hurt it too much. We could run benchmarks with and without it.

Simon Bernier St-Pierre · Answer 9 · Thu Jan 09 2020 03:59:26 GMT+0800 (China Standard Time)

We discussed about this a bit in #39. In the end we decided to go for a read timeout and a connect timeout. It's simpler and has less gotchas. If there's really a need for a request timeout, we can look into it in the future. In the mean time, it's still possible to use a background thread with a channel to create a request timeout. Now that we have a connect & read timeout, the connection attempt in the background thread cannot be leaked.

Sergey "Shnatsel" Davidoff · Answer 10 · Thu Jan 09 2020 04:18:16 GMT+0800 (China Standard Time)

If there's really a need for a request timeout, we can look into it in the future.

Yes, there is indeed a class of DoS attacks that cannot be defended against without it.

Consider an image hosting service like imgur.com: users enter URLs of images they want to host, and the HTTP client fetches the images and puts them into storage. If a user enters a URL to a malicious host that replies more often than the read timeout, but keeps replying indefinitely, this means that the connection will be open indefinitely. If an attacker keeps getting the service to open such connections, this will eventually lead to the exhaustion of the thread pool or network ports on the client.

Adam Reichold · Answer 11 · Thu Jan 09 2020 04:34:08 GMT+0800 (China Standard Time)

If an attacker keeps getting the service to open such connections, this will eventually lead to the exhaustion of the thread pool or network ports on the client.

There are many HTTP client implementation better suited to implementing such a service, specifically those that tie in with the underlying I/O framework, e.g. Tokio and Reqwest or Actix and AWC.

Simon Bernier St-Pierre · Answer 12 · Thu Jan 09 2020 04:53:06 GMT+0800 (China Standard Time)

Alright I'll leave this issue open because the attack you've described is legit.

I think we could perhaps protect ourselves from it by using a deadline in the BaseStream but without setting the deadline remaining time as the socket read timeout. So when a call to read is made we just check the if the deadline has been reached knowing that the call to read will eventually timeout because of the default read timeout. Just gotta make sure that timeout is not infinite.

Adam Reichold · Answer 13 · Thu Jan 09 2020 04:57:04 GMT+0800 (China Standard Time)

I think we could perhaps protect ourselves from it by using a deadline in the BaseStream but without setting the deadline remaining time as the socket read timeout.

I think it would be better to avoid putting to much high-level policy into the crate. We could in the same vein as the connect and read timeouts, just expose the underlying TCP stream in the API so that users can do whatever they wish, e.g. clone the stream and have another thread check the deadline and shutdown the stream when it is reached.

Adam Reichold · Answer 14 · Thu Jan 09 2020 05:00:47 GMT+0800 (China Standard Time)

But I have to question whether this attack scenario is really applicable here. Why can't we say that attohttpc should not be used as the HTTP client for such a service? If all HTTP client implementations have to implement all possible features, then there would be no need for a crate such as this one in the first place. Being able to keep things simple by not trying to handle every use case can be a virtue.

Sergey "Shnatsel" Davidoff · Answer 15 · Thu Jan 09 2020 05:11:26 GMT+0800 (China Standard Time)

The problem with reqwest/actix/etc is that they use async I/O and sit on an enormous pile of unsafe code that has never been audited. And async I/O is a reliability nightmare, especially on top of Rust's relatively immature async I/O stack, and is ridiculously hard to debug.

To wit: the latest release of reqwest deadlocks 6% of the time on a basic smoke test, while actix is still unsound and the maintainers refuse to fix it.

You only really need async I/O on the server side, where you have to keep 10,000 connections open at the same time. The client usually has no such requirements. So there is absolutely a market for a feature-rich HTTP client library with synchronous I/O and little unsafe code.

Simon Bernier St-Pierre · Answer 16 · Thu Jan 09 2020 05:20:30 GMT+0800 (China Standard Time)

@Shnatsel what's your take on how we should implement the request timeout? Should one exist by default or should we just allow the user to set one and leave the default to be "no timeout". I'm just wondering what would happen for instance if someone tries to download a really large file using this library and runs into the default request timeout.

Sergey "Shnatsel" Davidoff · Answer 17 · Thu Jan 09 2020 05:43:06 GMT+0800 (China Standard Time)

I do not think having a default timeout for the entire request is reasonable. If you're avoiding DoS in e.g. an API querying situation, you need it set to something like a minute, maybe even 10 seconds, which would be ridiculously low for all other use cases.

I think it's reasonable to set a connection timeout by default - but the OS already does that (unix, windows), and the lower timeout takes efect, so there's no need to do that in the client.

It is also reasonable to set a read timeout so that if the remote server goes away, the connection would be dropped. You don't want a user of something like wget to stare at the console for hours just because the remote host suddenly went away, the sane thing to do is to fail and let them retry. HTTP long-polling is not very common these days, and if you're doing that you'll know to increase a connection timeout anyway.

Adam Reichold · Answer 18 · Thu Jan 09 2020 05:59:21 GMT+0800 (China Standard Time)

To wit: the latest release of reqwest deadlocks 6% of the time on a basic smoke test, while actix is still unsound and the maintainers refuse to fix it.

And until the last release this crate panicked on decoding UTF-8 text which might not be memory corruption but is definitely a denial of service issue.

So there is absolutely a market for a feature-rich HTTP client library with synchronous I/O and little unsafe code.

But why does that statement imply that this crate needs to stop being what it is and drop its design rationale on the floor. Why not fork this into exahttpc to do that?

Sergey "Shnatsel" Davidoff · Answer 19 · Thu Jan 09 2020 06:40:45 GMT+0800 (China Standard Time)

And until the last release this crate panicked on decoding UTF-8 text which might not be memory corruption but is definitely a denial of service issue.

Fortunately panics are quite easy to plan for - you can catch and handle them, unlike deadlocks. They are also fairly trivial to debug, unlike deadlocks. And panics are basically the worst thing you can expect out of safe Rust without async I/O, save for a panic-while-panicking that will abort the process.

But why does that statement imply that this crate needs to stop being what it is and drop its design rationale on the floor.

Not necessarily. I just wanted to demonstrate that there are many common use cases that are currently not served well by any existing Rust HTTP client. And it looked like some of them could be served by attohttpc if it supported request timeouts, so I've opened this issue.

If you choose not to support these use cases, that's a valid design decision and I will respect it. But it would be a kind of a shame, because those use cases will remain without any library to serve them.