tokio-rs / turmoil

Add hardship to your tests

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Network partition should cause 'host unreachable' not 'connection refused' for TCP

cameronelliott opened this issue · comments

Summary: Unreachable hosts should cause an UnreachableHost rather than ConnectionRefused
on network partitions, etc.
Summary: I am not sure if Shuttle needs this level of fidelity just yet, and if anyone would notice the difference at this time. But someday simulations using Shuttle might take different actions based upon UnreachableHost vs ConnectionRefused, so it might make sense to fix.

detail

I modified the axum example by adding a single line before the client request:
turmoil::partition("client", "server");

Doing so resulted in this output:

[...]
thread 'main' panicked at examples/axum/src/main.rs:71:15:
called `Result::unwrap()` on an `Err` value: Error { kind: Connect, source: Some(Custom { kind:ConnectionRefused, error: "192.168.0.1:9999" }) }
[...]

Normally when trying to reach a TCP server via a partitioned network, a HostUnreachable error will occurr after a timeout period. A ConnectionRefused occurr will not occur, because a ConnectionRefused occurr happens when a box receiving a TCP syn rejects it, because there is no listener or server running on that port.

This can be demostrated by using curl from the command kine.

# in this first example, I am curling an IP address without a computer. 
# thus there is nothing to respond. it will timeout after ~3 seconds, and return host unreachable
c@intel12400 ~/t/e/axum (main) [7]> time curl -vvvvv 192.168.86.33
*   Trying 192.168.86.33:80...
* connect to 192.168.86.33 port 80 from 192.168.86.5 port 59648 failed: Host is unreachable
* Failed to connect to 192.168.86.33 port 80 after 3055 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to 192.168.86.33 port 80 after 3055 ms: Couldn't connect to server

________________________________________________________
Executed in    3.06 secs      fish           external
   usr time    6.04 millis  960.00 micros    5.08 millis
   sys time    0.32 millis  315.00 micros    0.00 millis
# this second example shows when a connection refused occurs
# I am curling to a valid IP with a computer running, but nothing running on the port specified
# thus the computer receives the TCP syn request, but denies it, cause nothing is on the port
c@intel12400 ~/t/e/axum (main) [7]> time curl -vvvvv 192.168.86.5:8888
*   Trying 192.168.86.5:8888...
* connect to 192.168.86.5 port 8888 from 192.168.86.5 port 41228 failed: Connection refused
* Failed to connect to 192.168.86.5 port 8888 after 0 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to 192.168.86.5 port 8888 after 0 ms: Couldn't connect to server

________________________________________________________
Executed in    5.57 millis    fish           external
   usr time    5.49 millis  701.00 micros    4.79 millis
   sys time    0.23 millis  226.00 micros    0.00 millis

Good catch. There isn't a distinction between the network dropping the SYN and the recipient host dropping it today. I also noticed this path as well when a host doesn't exist: https://github.com/tokio-rs/turmoil/blob/main/src/top.rs#L227

Here is the path for partition: https://github.com/tokio-rs/turmoil/blob/main/src/top.rs#L341, where we drop the SYN and the oneshot inside drops, failing the connect: https://github.com/tokio-rs/turmoil/blob/main/src/net/tcp/stream.rs#L88

Both of these issues are relatively easy to fix. Do you have any interest in cutting a PR? If not, I can get to this next week.

It looks like the HostUnreachable error is only supported in nightly builds. It might be best to wait to bring this in until rust-lang/rust#86442 is finalized.