use_file: Just use `libstd` on targets that we know have libstd, instead of pthreads + libc.

Question

use_file: Just use `libstd` on targets that we know have libstd, instead of pthreads + libc.

briansmith opened this issue 4 months ago · comments

I propose that on all targets where we know libstd is available, we use libstd instead of pthreads and libc for the use_file implementation, effectively reverting to the original implementation.

The current proposal for x86_64-unknown-linux-none in #424 is to limit support for that target to Linux kernel versions that have the getrandom syscall, so it won't use use_file.

Historically, we've tried to make it so getrandom avoids using libstd in part so that libstd itself could use getrandom as a dependency. However, I think it is clear that libstd is not going to use this crate as its needs are almost completely different; indeed its code has diverged considerably.

Another reason we've avoided libstd is to ensure that we are #[no_std] compatible. However, once we add x86_64-unknown-linux-none support, we'll have an actual no-std target that we can build (at least) for to ensure the core parts of this crate remain no_std compatible.

use_file is a significant chunk of the unsafe code, with the most problematic dependencies (libpthreads). It duplicates functionality that's in libstd, and the functionality it duplicates is exactly the kind of functionality (threading primitives) that we should avoid duplicating, as we're not getting any benefit from the QA and improvements in libstd.

Of all the targets that use the use_file implementation, only one doesn't have libstd support: 32-bit x86 QNX Neutrino. I believe @samkearney is working on it, based on their comment in rust-lang/rust#109173 (comment).

Sam Kearney · Answer 1 · Wed Jun 05 2024 02:11:56 GMT+0800 (China Standard Time)

Thanks for the mention @briansmith, just FYI, I have abandoned the work to bring std support to x86 QNX neutrino 7.0. QNX dropped support for 32-bit architectures in 7.1 (released 2020) and the cost-benefit for adding support to a 7-year-old release doesn't make sense anymore.

That being said, I also don't care if no_std x86 QNX 7.0 breaks, and I would wager pretty strongly that virtually nobody else is using it either (it is much more likely that they are using the more recent QNX 7.1 or the in-progress support for QNX 8).

Brian Smith · Answer 2 · Wed Jun 05 2024 02:36:12 GMT+0800 (China Standard Time)

I think we should switch 32-bit x86 QNX to be a target for which we support custom implementations, and drop the built-in implementation for it. That way we don't have to carry around the current implementation forever just for it. Anybody who needs the current implementation could adopt the current use_file code in their custom implementation.

Sam Kearney · Answer 3 · Wed Jun 05 2024 03:15:38 GMT+0800 (China Standard Time)

Works for me. 👍

Joe Richey · Answer 4 · Wed Jun 05 2024 03:57:44 GMT+0800 (China Standard Time)

Historically, we've tried to make it so getrandom avoids using libstd in part so that libstd itself could use getrandom as a dependency. However, I think it is clear that libstd is not going to use this crate as its needs are almost completely different; indeed its code has diverged considerably.

My long-term goal would still be to have Rust's standard library use getrandom, but that requires #365. Upstream seemed fairly open to it last time we tried, and it would be nice to have this crate be officially part of the Rust project. However, even if this crate were used by libstd:

The polling of /dev/random wouldn't be part of that
We would probably want to use the Mutex and File abstractions of libstd and not roll our own.

So I think this issue is still worth considering, even if we do try to get this into libstd one day.

I propose that on all targets where we know libstd is available, we use libstd instead of pthreads and libc for the use_file implementation, effectively reverting to the original implementation.

Do you know if there would be a way to poll the /dev/random file descriptor using std, or would we need to still use libc for that?

Brian Smith · Answer 5 · Wed Jun 05 2024 04:21:56 GMT+0800 (China Standard Time)

I agree with your "even if this crate were used by libstd...".

Do you know if there would be a way to poll the /dev/random file descriptor using std, or would we need to still use libc for that?

I think we'd still need to use AsFd::as_fd (1.63+) or as_raw_fd() to get the file descriptor and poll it with libc. But, only on Linux/Android.

Artyom Pavlov · Answer 6 · Thu Jun 06 2024 07:02:57 GMT+0800 (China Standard Time)

I am fine with using std for Mutex and File. There were proposals to make getrandom similar to the alloc crate, doing so would make use of std not possible. But it's probably too far in the future (if it ever happens in the first place), so we can ignore it.

John Nunley · Answer 7 · Tue Jun 11 2024 08:07:04 GMT+0800 (China Standard Time)

Note that this would be a breaking change, since we'd need to remove the std feature or make it useless.

Brian Smith · Answer 8 · Tue Jun 11 2024 08:11:17 GMT+0800 (China Standard Time)

The std feature would exist to control whether we use libstd on targets that aren't known to already support libstd, and to control whether we expose libstd in the public API, just like today.

I don't think this would be a breaking change for any target mentioned in https://doc.rust-lang.org/rustc/platform-support.html. It could perhaps be a breaking change for other targets, but I'm not sure getrandom supports any such targets (maybe accidentally).

Artyom Pavlov · Answer 9 · Fri Jun 14 2024 22:05:41 GMT+0800 (China Standard Time)

Maybe we could remove use of mutexes entirely and use sleep like we do for VxWorks? Contention of the file opening section is quite unlikely in practice, so even if the sleeping branch will be triggered, it's unlikely that impact will be noticeable by users.

Brian Smith · Answer 10 · Tue Jun 18 2024 00:29:57 GMT+0800 (China Standard Time)

@newpavlov mentioned that we basically implemented a spinlock for VxWorks. It seems like the Rust VxWorks targets support libstd, so I am hoping that we can replace that spinlock with whatever primitive we use in use_file, since VxWorks has the same structure as use_file on Linux, namely blocking on the operating system to report that the system PRNG has been seeded.

One of my goals is to remove as much of the custom synchronization logic in getrandom as we can, so I am hoping we can find a way to use libstd-provided primitives when possible. Unfortunately, the MSRV situation makes it tricky since OnceLock seems like the best choice, but it was only stablized in Rust 1.70.

Artyom Pavlov · Answer 11 · Tue Jun 18 2024 03:11:44 GMT+0800 (China Standard Time)

On second thought, I agree about libpthread mutex being problematic, but what are advantages of using std::fs::File instead of libc::open/read directly?

Brian Smith · Answer 12 · Wed Jun 19 2024 03:46:30 GMT+0800 (China Standard Time)

On second thought, I agree about libpthread mutex being problematic, but what are advantages of using std::fs::File instead of libc::open/read directly?

Except for some efficient tweaks, pretty much all of the recent changes I've submitted are part of a bigger project that consists of reducing the use of unsafe and some other dubious constructs (like casts and DIY synchronization primitives) in getrandom and some other crates.

The advantage will be bigger once the the Rust read_buf feature is stable and our MSRV is high enough to use it, because we'll be able to replace the use of libc::read with Read::.read_buf. Once that happens, the entire use_file module would not be free of unsafe code except for the polling of /dev/random. Similarly, once OnceLock::get_or_try_init is stable the advantage will be bigger again.

I've submitted PR #481 to make progress towards this.

Brian Smith · Answer 13 · Wed Jun 19 2024 06:50:28 GMT+0800 (China Standard Time)

OK, so to summarize this:

There are two main improvements that are available in libstd over the current code:

libstd uses futexes and we should really stop forcing a pthreads dependency. Unfortunately, std::sync::OnceLock doesn't really have the capabilities we need because OnceLock::get_or_try_init is still unstable. I think instead of trying to switch to libstd for synchroniziation primitives, we should instead switch to once_cell, as done in PR #481 and PR #483. Those two PRs, along with #484, completely eliminate use_file::Mutex and src/lazy.rs. I do think overall that is a good direction, because we'd be better off helping to maintain once_cell than we'd be if we kept maintaining our own synchronization primitives.

libstd will eventually have a better I/O API for use in use_file, so that we we usually won't have to depend on libc in use_file. Unfortunately, without the unstable read_buf feature, we can't safely write into a &mut [MaybeUninit<u8>], so (as @newpavlov pointed out), we still need to use libc::read (and/or rustix's better API if/when we want to do so). And on Android/Linux, even when we can use the read_buf feature, we would still need to use libc::poll (and/or rustix's better API if/when we want to do so). Rust 1.63 does offer std::os::fd::{OwnedFd/BorrowedFd} which slightly better models safe I/O compared to what we're presently doing, but it hardly seems worthwhile to add a libstd dependency just for that, So, for a while, we wouldn't be gaining much by using std::fs::File. So I'm thinking to change #481 to roll back the std::fs::File changes.

PR #485 shows how things would look when libstd's unstable features are stablized. Note that we'd still need OnceBool/LazyBool even still.

Artyom Pavlov · Answer 14 · Wed Jun 19 2024 07:26:31 GMT+0800 (China Standard Time)

As I wrote in the other PR, I think we can use futex directly on Linux targets. We already have the 32 bit FD atomic, which is perfectly suitable for it. I will try to play with it a bit later.

I am also not sure about once_cell. Yes, it's a widely used dependency, but it's still a 1k+ LoC dependency used to replace 10-20 easily reviewable LoCs.

Brian Smith · Answer 15 · Wed Jun 19 2024 07:57:18 GMT+0800 (China Standard Time)

I am also not sure about once_cell. Yes, it's a widely used dependency, but it's still a 1k+ LoC dependency

I agree that that is unfortunate but also I think many of us need to review it anyway for other projects so duplicating it in getrandom is extra work for a lot of people.

used to replace 10-20 easily reviewable LoCs.

LazyBool is only like 10-20 LoC, but that's because it's based on LazyUsize which isn't so trivial to review and which is over-featured compared to what we need. For example, it takes effort to verify that LazyUsize::UNINIT doesn't conflict with any real value, whereas once_cell::race::OnceNonZeroUsize has a clearer design.

It's not the end of the world, but IMO this isn't a great place to be making new synchronization primitives.

As I wrote in the other PR, I think we can use futex directly on Linux targets. We already have the 32 bit FD atomic, which is perfectly suitable for it. I will try to play with it a bit later.

I do think that this isn't a good place to put a futex-based mutex implementation either....

Joe Richey · Answer 16 · Fri Jun 21 2024 18:27:21 GMT+0800 (China Standard Time)

After looking through some of the proposed ideas:

I don't think we really need an external dependency our a custom futex implementation for this crate; we can simplify the code (and remove the libpthread dependency) without either of these things.

I think we should take one of two approaches:

Maximally use std to implement use_file

In this case we would use std to handle thread waiting and opening/closing files

The simplest way would be to have the following global state:

static LOCK: Mutex<()> = Mutex::new(());
static FD: AtomicI32 = AtomicI32::new(FD_UNINIT);

Then our implementation would look similar to what we currently have:

pub fn getrandom_inner(dest: &mut [MaybeUninit<u8>]) -> Result<(), Error> {
    let fd = get_fd().unwrap_or(init()?);
    // Call libc::read (we need to write to uninitialized memory)
}

fn get_fd() -> Option<RawFd> {
    match FD.load(Acquire) {
        FD_UNINIT => None,
        fd => Some(fd),
    }
}

#[cold]
fn init() -> Result<RawFd, Error> {
    let _guard = LOCK.lock();
    if let Some(fd) = get_fd() {
        return Ok(fd);
    }
    
    #[cfg(any(target_os = "android", target_os = "linux"))]
    wait_until_rng_ready()?;

    let fd = File::open("/dev/urandom")?.into_raw_fd();
    FD.store(fd, Relaxed);
    Ok(fd)
}

#[cold]
fn wait_until_rng_ready() -> Result<(), Error> {
    let file = File::open("/dev/random")?;
    let mut pfd = libc::pollfd {
        fd: file.as_raw_fd(),
        events: libc::POLLIN,
        revents: 0,
    };
    loop { /* Call libc::poll on the pfd */ }
}

Ideally, we would use std::sync::OnceCell::get_or_try_init, but that's currently nightly-only.
Advantages: Simple, only one file descriptor is ever open at a time, will use futexes on Linux.
Disadvantages: requires using std for use_file targets (which should be fine)

Use the libc::poll call itself to synchronize any threads which are waiting for /dev/random to become readable

Our implementation would look like:

pub fn getrandom_inner(dest: &mut [MaybeUninit<u8>]) -> Result<(), Error> {
    static FD: LazyFd = LazyFd::new();
    let fd = match FD.get() {
        Some(fd) => fd,
        None => {
            wait_until_rng_ready()?;
            FD.get_or_open("/dev/urandom")?
        }
    }
    // Call libc::read(fd)
}

#[cold]
fn wait_until_rng_ready() -> Result<(), Error> {
    static RANDOM_FD: LazyFd = LazyFd::new();
    let fd = RANDOM_FD.get_or_open("/dev/random")?;

    let mut pfd = libc::pollfd {
        fd,
        events: libc::POLLIN,
        revents: 0,
    };
    loop { /* Call libc::poll on the pfd */ }
}

Where the LazyFd type just racily opens a file, and closes the file if the thread loses the race.

struct LazyFd(AtomicI32);

impl LazyFd {
    const fn new() -> Self { Self(AtomicI32::new(FD_UNINIT)) }
    fn get(&self) -> Option<i32> {
        match self.0.load(Acquire) {
            FD_UNINIT => None,
            fd => Some(fd),
        }
    }
    fn get_or_open(&self, path: &str) -> Result<i32, Error> {
        if let Some(fd) = self.get() {
            return Ok(fd);
        }
        let new_fd = open_readonly(path)?;
        match self.0.compare_exchange(FD_UNINIT, new_fd, AcqRel, Acquire) {
            Ok(_) => Ok(new_fd),
            Err(old_fd) => {
                libc::close(new_fd);
                Ok(old_fd)
            }
        }
    }
}

Advantages: Avoids depending on std or libpthread. No locking/waiting on non-linux platforms.
Disadvantages: Complex, 2 FDs permanently used, theoretically can open more than two FDs (doesn't seem to happen in practice)

Looking at this, I would prefer the first approach of using std without an external dependancy.

Brian Smith · Answer 17 · Sat Jun 22 2024 02:59:54 GMT+0800 (China Standard Time)

I'm not sure why the "use once_cell" approach isn't considered. If once_cell is a problematic dependency for trying to define tricky synchronization primitives then getrandom would be a problematic dependency for the same reasons, except worse because nobody expects getrandom to be defining its own synchronization primitives. getrandom defining its own synchronization primitives creates a lot of work for people trying to use it.

I do think the promise that we at open at most N file descriptors where N is very small (1 or 2) should not be discounted. The LazyFd idea is clever but it's still unbounded in the number of file descriptors it opens concurrently as N threads could each open one.

I do think the std::sync::Mutex-based approach is the safest and simplest approach, so that's what I prefer.

Artyom Pavlov · Answer 18 · Sat Jun 22 2024 05:56:30 GMT+0800 (China Standard Time)

Personally, I don't think that our synchronization needs are tricky enough to warrant introduction of a new 2.2 kLoC dependency for tier-1 Linux targets. Even worse, this dependency will be present even with disabled file fallback.

I do think the std::sync::Mutex-based approach is the safest and simplest approach, so that's what I prefer.

What do you think about the sleep loop approach on non-Linux targets? AFAIK contention on the FD atomic should be extremely rare in practice in the first place and by surrounding time slice by sleeping we "solve" the potential priority inversion problem even in pathological scenarios.

Joe Richey · Answer 19 · Sat Jun 22 2024 10:20:22 GMT+0800 (China Standard Time)

I'm not sure why the "use once_cell" approach isn't considered. If once_cell is a problematic dependency for trying to define tricky synchronization primitives then getrandom would be a problematic dependency for the same reasons, except worse because nobody expects getrandom to be defining its own synchronization primitives. getrandom defining its own synchronization primitives creates a lot of work for people trying to use it.

Personally, I don't think that our synchronization needs are tricky enough to warrant introduction of a new 2.2 kLoC dependency for tier-1 Linux targets. Even worse, this dependency will be present even with disabled file fallback.

I would prefer to use once_cell if (and only if) we can replace all our synchronization needs with it. I wouldn't want someone doing a security review of this crate to have to read synchronization code in two different places. I initially thought rust-lang/cargo#1197 would prevent us from using the race feature on some targets and the std feature on others. However, in my experimentation, it seems like we could actually do this.

I'll try to prototype what it would look like to actually use once_cell for all the syncronization stuff (NetBSD Weak symbol, shared /dev/{urandom, random} fds, LazyBool, VxWorks init).

I do think the promise that we at open at most N file descriptors where N is very small (1 or 2) should not be discounted. The LazyFd idea is clever but it's still unbounded in the number of file descriptors it opens concurrently as N threads could each open one.

What do you think about the sleep loop approach on non-Linux targets? AFAIK contention on the FD atomic should be extremely rare in practice in the first place and by surrounding time slice by sleeping we "solve" the potential priority inversion problem even in pathological scenarios.

If we end up going with a "no locking" approach, I would be fine implementing something like LazyFd::get_or_open by either:

Sleeping if the threads race on libc::open
Opening (and immediately closing) an extra file if the threads race on libc::open

Given that the race is very unlikely (I haven't been able to even make it happen once in my testing), any reasonable approach seems fine.

Brian Smith · Answer 20 · Sun Jun 23 2024 06:11:23 GMT+0800 (China Standard Time)

I'll try to prototype what it would look like to actually use once_cell for all the syncronization stuff (NetBSD Weak symbol, shared /dev/{urandom, random} fds, LazyBool, VxWorks init).

NetBSD weak symbol support needs at least a new race::OnceNonNull or otherwise a full race::OnceWeak weak function binding API. race::OnceRef won't work for function pointers. I'm not sure OnceNonNull would be accepted by them because its an inherently less-safe API (dealing with pointers).