Document missing safe abstractions

Question

Document missing safe abstractions

Shnatsel opened this issue 5 years ago · comments

Sergey "Shnatsel" Davidoff commented 5 years ago

Some crates, e.g. reqwest (see #5) clearly indicate the need for better safe abstractions, as their logic cannot be expressed in terms of the existing ones.

The worst offender by far is the Read trait which requires an initialized slice to write to, but initializing a slice is costly, so people just throw uninitialized slices at it and hope for the best. This unsafety could be encapsulated by appending to a Vec-like fixed-size structure.

Another such case is rust-lang/rfcs#2714

I'm thinking of filing issues on https://github.com/rust-lang/rust/ and then linking to them from some markdown file or from this issue. Thoughts?

Lokathor · Answer 1 · Wed Sep 04 2019 05:04:03 GMT+0800 (China Standard Time)

I think part of this is that you're supposed to reuse a buffer, not just make it, read to it, and throw it away.

perhaps their code could be made to reuse buffers more often and average out the cost of the initial zero initialization?

Sergey "Shnatsel" Davidoff · Answer 2 · Wed Sep 04 2019 05:20:36 GMT+0800 (China Standard Time)

If initialization is done only once it comes without any overhead because the allocator can request already-zeroed memory from the OS. It's the subsequent initializations that require memset().

Lokathor · Answer 3 · Wed Sep 04 2019 05:23:31 GMT+0800 (China Standard Time)

That only holds if no one else has allocated and deallocated before you create your vec. With how box happy a lot of Rust code sadly is, you cannot assume that.

Matt Taylor · Answer 4 · Wed Sep 04 2019 05:45:32 GMT+0800 (China Standard Time)

If initialization is done only once it comes without any overhead because the allocator can request already-zeroed memory from the OS. It's the subsequent initializations that require memset().

This is not always the case. Not every single fresh allocation comes directly from the OS (that’s exactly what allocators are supposed to avoid, as it’s very slow).

Tony Arcieri · Answer 5 · Wed Sep 04 2019 05:49:12 GMT+0800 (China Standard Time)

This unsafety could be encapsulated by appending to a Vec-like fixed-size structure.

There are several crates of this nature. heapless::Vec comes to mind for me:

https://docs.rs/heapless/latest/heapless/struct.Vec.html

Sergey "Shnatsel" Davidoff · Answer 6 · Wed Sep 04 2019 05:53:54 GMT+0800 (China Standard Time)

Oh, another one! There are also:

Sergey "Shnatsel" Davidoff · Answer 7 · Fri Nov 01 2019 23:40:21 GMT+0800 (China Standard Time)

Another use case for fixed-capacity Vec-like memory view - flate2 needs it to become 100% safe. See #32

Sergey "Shnatsel" Davidoff · Answer 8 · Thu Nov 07 2019 20:59:06 GMT+0800 (China Standard Time)

Regarding the "fixed-capacity Vec" approach: tokio is working on an AsyncRead trait that writes to a Vec-like structure instead of a slice. This lets us sidestep the thorny issues around uninitialized memory, since it's all safely encapsulated. I'm not sure if the capacity is bounded, so I've left some comments on the PR to clarify that: tokio-rs/tokio#1744

Douman · Answer 9 · Thu Nov 07 2019 21:43:37 GMT+0800 (China Standard Time)

The worst offender by far is the Read trait which requires an initialized slice to write to, but initializing a slice is costly, so people just throw uninitialized slices at it and hope for the best.

Being able to write uninit mem is must have for any performance oriented code
I do not believe there is any way to completely avoid it.
It should not be UB to write uninit &mut [T], which is de-facto safe and uninitt &[T] should be UB only when you read. (I believe right now it is UB?)

Daniel Henry-Mantilla · Answer 10 · Thu Nov 07 2019 21:59:33 GMT+0800 (China Standard Time)

@DoumanAsh the problem does not lie in writing to a &mut [u8] being UB, but in the Read trait requiring a &mut [u8] to write to it.

Read ought to be reworked to have the API offered in https://docs.rs/uninit

Douman · Answer 11 · Thu Nov 07 2019 22:22:04 GMT+0800 (China Standard Time)

I do not see a particular problem with current Read and it would be hard to remove something that is 'broken' due to backward compatibility

Well you can extend Read to have something like read_into_uninit but it needs to have default impl and there is not much point to it anyway.
Read cannot break backward compatibility

Matt Taylor · Answer 12 · Fri Nov 08 2019 00:49:40 GMT+0800 (China Standard Time)

I do not see a particular problem with the current Read

The problem (IIUC) is that a caller to a generic Read implementation cannot pass uninitialised memory to be filled in, as all the relevant methods are not marked unsafe (or don’t accept a structure capable of preventing uninit mem accesses like Shnatsel is describing).

So to be safe everyone has to pass initialised memory which wastes some cycles.

Lokathor · Answer 13 · Fri Nov 08 2019 00:55:25 GMT+0800 (China Standard Time)

Yeah, what we want is unsafe trait ReadToUninit that takes &mut [MaybeUninit<u8>] and then assures that however much it says it read is now safe to use as initialized.

Sergey "Shnatsel" Davidoff · Answer 14 · Fri Nov 08 2019 00:57:00 GMT+0800 (China Standard Time)

As someone who has raised the topic of an improved Read trait before, I can already tell this is going to be a long discussion due to the sheer size of the design space. If you want to continue it, please do so in a different place.

8573 · Answer 15 · Sat Aug 01 2020 00:20:52 GMT+0800 (China Standard Time)

I'm posting here only under the "You do not have to be an unsafe expert to help out" clause, but: is the following a worthwhile "missing safe abstraction" candidate to track here?

This project uses a small amount of unsafe code to provide the same semantics of extend_from_slice but in a much faster way (over 4x faster). Not quite sure why it's much faster, but if you are uncomfortable with unsafe code, feel free to set SAFE_ONLY to true at the top of src/lib.rs. This will restrict this fuzzer to only generate safe code. I don't think this is necessary but who knows :)

— https://github.com/gamozolabs/fzero_fuzzer#unsafe-code

(I imagine too that, if one person does this who cares enough about safe code to warn about it and to provide a safe alternative, several more people do this who don't.)

Daniel Henry-Mantilla · Answer 16 · Sat Aug 01 2020 03:45:36 GMT+0800 (China Standard Time)

The line https://github.com/gamozolabs/fzero_fuzzer/blob/6fe91bcd87af1db71472f4b549e66ea273811576/src/main.rs#L302 can wrap and overflow, especially on release with its overflow-checks = false default setting, which means that the .reserve() may not happen even when necessary, which makes the following copy_nonoverlapping() go out of bounds. I don't know the lengths of the slices involved to know if the "overflow usize" scenario can realistically happen, though, which means it may be fine in practice...

Lokathor · Answer 17 · Sat Aug 01 2020 04:22:50 GMT+0800 (China Standard Time)

the element count of a slice can't actually exceed isize::MAX in practice, because llvm's IR operation for indexing uses signed values. though this is not well documented at the rust level.

Brandon H. Gomes · Answer 18 · Tue Aug 03 2021 00:29:46 GMT+0800 (China Standard Time)

Regarding the "fixed-capacity Vec" approach: tokio is working on an AsyncRead trait that writes to a Vec-like structure instead of a slice. This lets us sidestep the thorny issues around uninitialized memory, since it's all safely encapsulated. I'm not sure if the capacity is bounded, so I've left some comments on the PR to clarify that: tokio-rs/tokio#1744

I don't know if this is still an issue, but arrayvec may be a useful crate here.