Unexplained race condition in v0.16 causing "runtime dropped the dispatch task"
Nikita240 opened this issue · comments
After upgrading from bollard v0.15 to v0.16 I started encountering a race condition in my unit tests. I believe this is likely related to the upgrade to hyper v1.1, but I can't quite pin down what's happening.
Here is the test setup to replicate:
//! [dependencies]
//! bollard = "0.16.0"
//! tokio = { version = "1.24.2", features = ["rt-multi-thread", "macros", "fs"] }
//! once_cell = "1.19.0"
use bollard::{image::ListImagesOptions, Docker};
use once_cell::sync::OnceCell;
static DOCKER: OnceCell<Docker> = OnceCell::new();
fn get_docker() -> Result<&'static Docker, bollard::errors::Error> {
DOCKER.get_or_try_init(Docker::connect_with_socket_defaults)
}
#[tokio::test(flavor = "multi_thread")]
async fn test_runtime() {
run_test(10).await;
}
#[tokio::test(flavor = "multi_thread")]
async fn test_runtime_2() {
run_test(10).await;
}
#[tokio::test(flavor = "multi_thread")]
async fn test_runtime_3() {
run_test(100).await;
}
async fn run_test(count: usize) {
let docker = get_docker().unwrap();
for _ in 0..count {
let _ = &docker
.list_images(Some(ListImagesOptions::<String> {
all: true,
..Default::default()
}))
.await
.unwrap();
}
}
Here is what the error looks like:
running 3 tests
test test_runtime ... ok
test test_runtime_3 ... FAILED
test test_runtime_2 ... ok
failures:
---- test_runtime_3 stdout ----
thread 'test_runtime_3' panicked at tests/bollard.rs:33:14:
called `Result::unwrap()` on an `Err` value: HyperLegacyError { err: Error { kind: SendRequest, source: Some(hyper::Error(User(DispatchGone), "runtime dropped the dispatch task")) } }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
failures:
test_runtime_3
The test failures are random and inconsistent.
rustc 1.76.0 (07dca489a 2024-02-04)
Do you have any ideas how to root-cause this?
Thanks for the report... I see there's a v1.2.0
version of Hyper out and version v0.1.3
of hyper-util, let's see if we can reproduce this on those versions.
I actually can't reproduce this problem. Can you give more detail on your system, and maybe any dockerd logs you find ? you can turn on debug logging in the daemon using the following configuration in /etc/docker/daemon.json
:
{
"debug": true,
"raw-logs": true
}
That's very strange. I'm able to replicate this on two different machines running different docker versions.
I think the issue here is caused by the statically stored Docker
instance static DOCKER: OnceCell<Docker>
.
When running tokio
tests with multi_thread
, tokio will actually run the tests concurrently, but spawn a unique runtime for each one of them.
As of bollard@0.16
, somehow, the Docker
instance "absorbs" the first tokio runtime it sees, and if that runtime is dropped while someone else is making a request, you get the error "runtime dropped the dispatch task"
.
Ah yes, I see it now if you run them all together..
I put this test scenario into bollard's CI system, and it seems to fail on all connectors (http / ssl / named pipe / unix socket), so that excludes any issue with any individual connector. I also checked locally running against the latest master branch of hyper and it still fails (albeit less often).
I did find a fix, if you have the time, I'd appreciate if you can check if it works for you.. #390
Related to this hyperium/hyper#2312
I just got around to test your fix. I can confirm it works!
Thank you so much for your support on this ❤️