madsim-rs / madsim

Magical Deterministic Simulator for distributed systems in Rust.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bug: the first RPC will timeout after kill-restart

wangrunji0408 opened this issue · comments

https://github.com/huang-jl/madsim-bug/blob/main/src/lib.rs

[ WARN][0.114723s][0.0.0.0:0][madsim_test::tests] Server crash!
[ WARN][5.114723s][0.0.0.0:0][madsim_test::tests] Server restart!
[ INFO][10.114723s][10.0.0.2:1234][madsim_test] Send Incr request from 10.0.0.2:1234 to 10.0.0.1:8000
[ WARN][11.114723s][10.0.0.2:1234][madsim_test] Send incr fail: RPC timeout  <---
[ INFO][11.114723s][10.0.0.2:1234][madsim_test] Send Incr request from 10.0.0.2:1234 to 10.0.0.1:8000
[ INFO][11.124569s][10.0.0.1:8000][madsim_test] Receive increment request, local addr = 10.0.0.1:8000
[ INFO][11.128317s][10.0.0.2:1234][madsim_test] Send incr success

The reason is that we didn't clear the network status on restart, so the first recv operation after restart got the message before kill immediately, then the sender got timeout.