async-rs / async-std

Async version of the Rust standard library

Home Page:https://async.rs

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

panic in async_std::fs::File::into_raw_fd

doy opened this issue · comments

this code causes a panic for me after running for ~30 seconds:

use async_std::io::WriteExt as _;
use std::os::unix::io::{FromRawFd as _, IntoRawFd as _};

async fn async_main() -> anyhow::Result<()> {
    let (r, w) = nix::unistd::pipe()?;
    match unsafe { nix::unistd::fork()? } {
        nix::unistd::ForkResult::Parent { .. } => {
            nix::unistd::close(r)?;
            loop {
                let mut fh = unsafe { async_std::fs::File::from_raw_fd(w) };
                fh.write_all(b"foobarbaz").await?;
                fh.flush().await?;
                let _ = fh.into_raw_fd();
            }
        }
        nix::unistd::ForkResult::Child => {
            nix::unistd::close(w).unwrap();
            loop {
                let mut data = [0; 4096];
                let n = nix::unistd::read(r, &mut data).unwrap();
                if n == 0 {
                    break;
                }
            }
        }
    }
    Ok(())
}

fn main() {
    match async_std::task::block_on(async_main()) {
        Ok(()) => {
            std::process::exit(0);
        }
        Err(e) => {
            eprintln!("{}", e);
            std::process::exit(1);
        }
    };
}

the panic it gives me is:

thread 'main' panicked at 'cannot acquire ownership of the file handle after drop: File { fd: 7, path: "pipe:[42778011]", read: false, write: true }', /home/doy/.cargo/registry/src/github.com-1ecc6299db9ec823/async-std-1.10.0/src/fs/file.rs:436:18
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

as far as i can tell, i'm not doing anything improper here - the fork shouldn't be relevant (all of the relevant code is running in the parent process, and my real code that i distilled this from didn't have a fork), and the docs for from_raw_fd and into_raw_fd say that the only thing to watch out for is that async_std::fs::File needs to have exclusive access to the file descriptor, which i believe this code is enforcing (each iteration through the loop passes ownership of the fd from the async_main function into the File object via from_raw_fd, and then passes it back out of the file object and into the main function via into_raw_fd, and nothing else ever touches that file descriptor).

it feels like there is maybe a race condition somewhere where the internal thread pools are keeping a reference to the file alive a bit longer than they should, because as far as i can tell, into_raw_fd is panicking because there is more than one live reference to the file object somewhere, but that somewhere is not in the code that i have written.

$ rustc -Vv
rustc 1.57.0 (f1edd0429 2021-11-29)
binary: rustc
commit-hash: f1edd0429582dd29cccacaf50fd134b05593bd9c
commit-date: 2021-11-29
host: x86_64-unknown-linux-gnu
release: 1.57.0
LLVM version: 13.0.0

and i am using async-std v1.10.0

actually, it turns out i can reduce this even further - this code also gives me the same panic (you may need to adjust the loop range on your system):

#[test]
fn test_into_raw_fd() {
    use std::os::unix::io::{FromRawFd as _, IntoRawFd as _};

    let std_fh = std::fs::File::open(std::env::current_exe().unwrap()).unwrap();
    let fd = std_fh.into_raw_fd();
    for _ in 0..2000000 {
        let fh = unsafe { async_std::fs::File::from_raw_fd(fd) };
        let _ = fh.into_raw_fd();
    }
}

which is confusing because it isn't actually even doing anything? not sure what's going on here

ah, it looks like if i comment out

let _ = futures_lite::future::block_on(self.flush());
the panic goes away - i guess that is what is sometimes holding onto the file for too long. this isn't a problem in most situations because typically when a File is dropped, there is no subsequent test to ensure that there are no remaining references to it. is there a reason that into_raw_fd needs this check?

I can reproduce this bug, and I can confirm that the issue appears to be a race between the flush and the into_raw_fd.

I think what may be going on is that the LockGuard unlocks and wakes up other tasks in its Drop impl, but there's a brief window in which the LockGuard's drop method has finished unlocking but the LockGuard's fields still exist and have not yet been dropped. That, in turn, means that the File can finish the flush and get fully dropped (including its Lock<State> field) before the LockGuard<State> finishes getting dropped.

That is indeed the problem. One possible fix could be to drop the Arc before waking up the tasks by turning the LockGuard inner value into an Option.

PR with this fix: #1056

Thank you very much for the fix!