madsim-rs / madsim

Magical Deterministic Simulator for distributed systems in Rust.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Process termination.

yiyuanliu opened this issue · comments

Processes may terminate in a number of ways, eg SIGTERM, SIGKILL, machine down. Currently, Madsim does not simulate processes termination very well.

Madsim provides kill function to terminate a host, but the semantic of the kill function is unclear.

See this code:

struct Foo(String);

impl Foo {
    fn new(name: &str) -> Foo {
        println!("foo::new in {}", name);
        Foo(name.to_owned())
    }
}

impl Drop for Foo {
    fn drop(&mut self) {
        println!("foo::drop in {}!", self.0);
    }
}

#[madsim::main]
async fn main() {
    let handle = madsim::Handle::current().create_host("127.0.0.1:10086").unwrap();
    handle.spawn(async move {
        let _foo = Foo::new("madsim");
        loop {
            madsim::time::sleep(std::time::Duration::from_secs(1)).await;
        }
    }).detach();

    madsim::time::sleep(std::time::Duration::from_secs(1)).await;
    madsim::Handle::current().kill("127.0.0.1:1".parse().unwrap());
    loop {
        madsim::time::sleep(std::time::Duration::from_secs(1)).await;
    }
}

Run this code with features sim enabled, we will find that foo::drop is called. This means that the kill function behaves more like SIGTERM than SIGKILL, and this function cannot simulate some unexpected events (machine failure, out of memory, etc.). We have no way to know that when all async tasks on that host will be dropped, so it's still hard to determine a suitable time to restart host.

If we replace sleep in spawned task with futures::future::pending::<()>().await. We will see that the drop function is not called, which looks like SIGKILL (but not same as SIGKILL).

It seems difficult to simulate SIGKILL or machine failure. If the drop function is not called, there may be resource leaks (memory, file locks, etc.)

Another problem is that drop function won't be called automatically after process termination. See this code:

struct Foo(String);

impl Foo {
    fn new(name: &str) -> Foo {
        println!("foo::new in {}", name);
        Foo(name.to_owned())
    }
}

impl Drop for Foo {
    fn drop(&mut self) {
        println!("foo::drop in {}!", self.0);
    }
}

#[madsim::main]
async fn main() {
    let handle = madsim::Handle::current().create_host("127.0.0.1:10086").unwrap();
    handle.spawn(async move {
        let _foo = Foo::new("madsim");
        loop {
            madsim::time::sleep(std::time::Duration::from_secs(1)).await;
        }
    }).detach();

    madsim::time::sleep(std::time::Duration::from_secs(1)).await;
}

Foo::drop won't be called run this code with feature sim or std. This makes it difficult for applications that use madsim to exit gracefully. It seems would be better to provide an async fn as an entry function when the host is created, like spawn_blocking in tokio. When this entry function exits, the host is considered finished. And provide a joinhandle to wait for the host to finish.

Note: We should add a hook function on kill.