Network partition should cause 'host unreachable' not 'connection refused' for TCP
cameronelliott opened this issue · comments
Summary: Unreachable hosts should cause an UnreachableHost rather than ConnectionRefused
on network partitions, etc.
Summary: I am not sure if Shuttle needs this level of fidelity just yet, and if anyone would notice the difference at this time. But someday simulations using Shuttle might take different actions based upon UnreachableHost vs ConnectionRefused, so it might make sense to fix.
detail
I modified the axum example by adding a single line before the client request:
turmoil::partition("client", "server");
Doing so resulted in this output:
[...]
thread 'main' panicked at examples/axum/src/main.rs:71:15:
called `Result::unwrap()` on an `Err` value: Error { kind: Connect, source: Some(Custom { kind:ConnectionRefused, error: "192.168.0.1:9999" }) }
[...]
Normally when trying to reach a TCP server via a partitioned network, a HostUnreachable error will occurr after a timeout period. A ConnectionRefused occurr will not occur, because a ConnectionRefused occurr happens when a box receiving a TCP syn rejects it, because there is no listener or server running on that port.
This can be demostrated by using curl
from the command kine.
# in this first example, I am curling an IP address without a computer.
# thus there is nothing to respond. it will timeout after ~3 seconds, and return host unreachable
c@intel12400 ~/t/e/axum (main) [7]> time curl -vvvvv 192.168.86.33
* Trying 192.168.86.33:80...
* connect to 192.168.86.33 port 80 from 192.168.86.5 port 59648 failed: Host is unreachable
* Failed to connect to 192.168.86.33 port 80 after 3055 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to 192.168.86.33 port 80 after 3055 ms: Couldn't connect to server
________________________________________________________
Executed in 3.06 secs fish external
usr time 6.04 millis 960.00 micros 5.08 millis
sys time 0.32 millis 315.00 micros 0.00 millis
# this second example shows when a connection refused occurs
# I am curling to a valid IP with a computer running, but nothing running on the port specified
# thus the computer receives the TCP syn request, but denies it, cause nothing is on the port
c@intel12400 ~/t/e/axum (main) [7]> time curl -vvvvv 192.168.86.5:8888
* Trying 192.168.86.5:8888...
* connect to 192.168.86.5 port 8888 from 192.168.86.5 port 41228 failed: Connection refused
* Failed to connect to 192.168.86.5 port 8888 after 0 ms: Couldn't connect to server
* Closing connection
curl: (7) Failed to connect to 192.168.86.5 port 8888 after 0 ms: Couldn't connect to server
________________________________________________________
Executed in 5.57 millis fish external
usr time 5.49 millis 701.00 micros 4.79 millis
sys time 0.23 millis 226.00 micros 0.00 millis
Good catch. There isn't a distinction between the network dropping the SYN and the recipient host dropping it today. I also noticed this path as well when a host doesn't exist: https://github.com/tokio-rs/turmoil/blob/main/src/top.rs#L227
Here is the path for partition: https://github.com/tokio-rs/turmoil/blob/main/src/top.rs#L341, where we drop the SYN and the oneshot
inside drops, failing the connect: https://github.com/tokio-rs/turmoil/blob/main/src/net/tcp/stream.rs#L88
Both of these issues are relatively easy to fix. Do you have any interest in cutting a PR? If not, I can get to this next week.
It looks like the HostUnreachable
error is only supported in nightly builds. It might be best to wait to bring this in until rust-lang/rust#86442 is finalized.