shadow / shadow

Shadow is a discrete-event network simulator that directly executes real application code, enabling you to simulate distributed systems with thousands of network-connected processes in realistic and scalable private network experiments using your laptop, desktop, or server running Linux.

Home Page:https://shadow.github.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Graceful shutdown of simulations

robgjansen opened this issue · comments

It would be nice to have a way to preemptively end a Shadow simulation early by effectively modifying the simulation stop_time from whatever it was initially configured as to the current sim time (now).

Our current behavior on receiving SIGTERM or SIGINT (Ctrl-C) is to flush the log and exit, which isn't quite as graceful as we could be and doesn't allow shadow to log things normally logged at the end of the simulation (such as syscall counts, object usage), check processes expected end states, etc.

It might be easy to add this, at the end of each scheduler round you could check if a flag was set by a signal handler and then stop the simulation early.

We're thinking we want something like the following:

  • First SIGINT (ctrl-c) puts you into SIGINTx1 state, graceful shutdown mode, where the scheduler will check at the end of the next round and end the simulation early, and then do cleanup and end-of-simulation stuff as normal.
  • Second SIGINT (ctrl-c) is for the impatient and puts you into SIGINTx2 state, where we give up on whatever we're doing, flush the log, and exit.
  • Any SIGTERM puts you into the SIGINTx1, but will not move you out of the SIGINTx2 state if you're already there.
  • The end of the shadow log should have a warning that shadow exited with the type of termination state that triggered the exit.
  • We probably want to return a non-zero error code, for sure if we exited via SIGINTx2, and maybe also if we exited via SIGINTx1