rust-secure-code / safety-dance

Auditing crates for unsafe code which can be safely replaced

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Document missing safe abstractions

Shnatsel opened this issue · comments

Some crates, e.g. reqwest (see #5) clearly indicate the need for better safe abstractions, as their logic cannot be expressed in terms of the existing ones.

The worst offender by far is the Read trait which requires an initialized slice to write to, but initializing a slice is costly, so people just throw uninitialized slices at it and hope for the best. This unsafety could be encapsulated by appending to a Vec-like fixed-size structure.

Another such case is rust-lang/rfcs#2714

I'm thinking of filing issues on https://github.com/rust-lang/rust/ and then linking to them from some markdown file or from this issue. Thoughts?

I think part of this is that you're supposed to reuse a buffer, not just make it, read to it, and throw it away.

perhaps their code could be made to reuse buffers more often and average out the cost of the initial zero initialization?

If initialization is done only once it comes without any overhead because the allocator can request already-zeroed memory from the OS. It's the subsequent initializations that require memset().

That only holds if no one else has allocated and deallocated before you create your vec. With how box happy a lot of Rust code sadly is, you cannot assume that.

If initialization is done only once it comes without any overhead because the allocator can request already-zeroed memory from the OS. It's the subsequent initializations that require memset().

This is not always the case. Not every single fresh allocation comes directly from the OS (that’s exactly what allocators are supposed to avoid, as it’s very slow).

This unsafety could be encapsulated by appending to a Vec-like fixed-size structure.

There are several crates of this nature. heapless::Vec comes to mind for me:

https://docs.rs/heapless/latest/heapless/struct.Vec.html

Another use case for fixed-capacity Vec-like memory view - flate2 needs it to become 100% safe. See #32

Regarding the "fixed-capacity Vec" approach: tokio is working on an AsyncRead trait that writes to a Vec-like structure instead of a slice. This lets us sidestep the thorny issues around uninitialized memory, since it's all safely encapsulated. I'm not sure if the capacity is bounded, so I've left some comments on the PR to clarify that: tokio-rs/tokio#1744

The worst offender by far is the Read trait which requires an initialized slice to write to, but initializing a slice is costly, so people just throw uninitialized slices at it and hope for the best.

Being able to write uninit mem is must have for any performance oriented code
I do not believe there is any way to completely avoid it.
It should not be UB to write uninit &mut [T], which is de-facto safe and uninitt &[T] should be UB only when you read. (I believe right now it is UB?)

@DoumanAsh the problem does not lie in writing to a &mut [u8] being UB, but in the Read trait requiring a &mut [u8] to write to it.

Read ought to be reworked to have the API offered in https://docs.rs/uninit

I do not see a particular problem with current Read and it would be hard to remove something that is 'broken' due to backward compatibility

Well you can extend Read to have something like read_into_uninit but it needs to have default impl and there is not much point to it anyway.
Read cannot break backward compatibility

I do not see a particular problem with the current Read

The problem (IIUC) is that a caller to a generic Read implementation cannot pass uninitialised memory to be filled in, as all the relevant methods are not marked unsafe (or don’t accept a structure capable of preventing uninit mem accesses like Shnatsel is describing).

So to be safe everyone has to pass initialised memory which wastes some cycles.

Yeah, what we want is unsafe trait ReadToUninit that takes &mut [MaybeUninit<u8>] and then assures that however much it says it read is now safe to use as initialized.

As someone who has raised the topic of an improved Read trait before, I can already tell this is going to be a long discussion due to the sheer size of the design space. If you want to continue it, please do so in a different place.

commented

I'm posting here only under the "You do not have to be an unsafe expert to help out" clause, but: is the following a worthwhile "missing safe abstraction" candidate to track here?

This project uses a small amount of unsafe code to provide the same semantics of extend_from_slice but in a much faster way (over 4x faster). Not quite sure why it's much faster, but if you are uncomfortable with unsafe code, feel free to set SAFE_ONLY to true at the top of src/lib.rs. This will restrict this fuzzer to only generate safe code. I don't think this is necessary but who knows :)

https://github.com/gamozolabs/fzero_fuzzer#unsafe-code

(I imagine too that, if one person does this who cares enough about safe code to warn about it and to provide a safe alternative, several more people do this who don't.)

The line https://github.com/gamozolabs/fzero_fuzzer/blob/6fe91bcd87af1db71472f4b549e66ea273811576/src/main.rs#L302 can wrap and overflow, especially on release with its overflow-checks = false default setting, which means that the .reserve() may not happen even when necessary, which makes the following copy_nonoverlapping() go out of bounds. I don't know the lengths of the slices involved to know if the "overflow usize" scenario can realistically happen, though, which means it may be fine in practice...

the element count of a slice can't actually exceed isize::MAX in practice, because llvm's IR operation for indexing uses signed values. though this is not well documented at the rust level.

Regarding the "fixed-capacity Vec" approach: tokio is working on an AsyncRead trait that writes to a Vec-like structure instead of a slice. This lets us sidestep the thorny issues around uninitialized memory, since it's all safely encapsulated. I'm not sure if the capacity is bounded, so I've left some comments on the PR to clarify that: tokio-rs/tokio#1744

I don't know if this is still an issue, but arrayvec may be a useful crate here.