rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do

Home Page:https://rust-lang.github.io/unsafe-code-guidelines

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Minimum guarantees regarding `UnsafeCell`

joshlf opened this issue · comments

As part of google/zerocopy#251, we're trying to teach zerocopy to reason in more precise ways about UnsafeCells. I'm making this issue as sort of an omnibus issue to clarify a number of questions regarding UnsafeCells. As always with zerocopy, I'm only interested in what can currently be promised, not what might be promised in the future.

In particular, my questions concern when it's sound for different code to disagree on whether a given memory region is covered by an UnsafeCell.

Constraints

Based on Miri's behavior and APIs that already exist in the standard library, we can observe some constraints on what code is sound/unsound. In particular:

  • Any time Miri treats code as unsound, I take that to mean that we cannot guarantee that it is sound even if it might be decided to be sound at some point in the future
  • Just because Miri treats code as sound, it may not be guaranteed to be sound
  • If a stable API exists which implies that code is sound, then it probably is guaranteed to be sound

It's unsound (under stacked borrows) to have multiple active references which "disagree" about whether a memory region is covered by an UnsafeCell

Status: can't assume sound

Prior art: #455, #303

As soon as there are multiple references which are "active"* or "not shadowed" or whatever the proper terminology is, and these references disagree about whether a given memory region is covered by an UnsafeCell, this is insta-UB according to stacked borrows even if the references are never used for anything. For example:

let u = 1usize;
let u_ref = &u;
let _c_ref = unsafe { &*(u_ref as *const usize as *const Cell<usize>) }; // Insta-UB

*Previously, this post used the term "live", but @CAD97 pointed out that "active" is the more accurate term.

Miri output
error: Undefined Behavior: trying to retag from <1716> for SharedReadWrite permission at alloc809[0x0], but that tag only grants SharedReadOnly permission for this location
 --> src/main.rs:6:27
  |
6 |     let _c_ref = unsafe { &*(u_ref as *const usize as *const Cell<usize>) };
  |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |                           |
  |                           trying to retag from <1716> for SharedReadWrite permission at alloc809[0x0], but that tag only grants SharedReadOnly permission for this location
  |                           this error occurs as part of retag at alloc809[0x0..0x8]
  |
  = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
  = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <1716> was created by a SharedReadOnly retag at offsets [0x0..0x8]
 --> src/main.rs:6:30
  |
6 |     let _c_ref = unsafe { &*(u_ref as *const usize as *const Cell<usize>) };
  |                              ^^^^^
  = note: BACKTRACE (of the first span):
  = note: inside `main` at src/main.rs:6:27: 6:74

It's still unsound even if the overlap is only "partial"

Status: can't assume sound

In this example, only some of the bytes go from covered by an UnsafeCell to not covered.

#[repr(C)]
struct Foo(Cell<u32>, u32);

let u = 1u64;
let u_ref = &u;
let _c_ref = unsafe { &*(u_ref as *const u64 as *const Foo) }; // Insta-UB
Miri output
error: Undefined Behavior: trying to retag from <1716> for SharedReadWrite permission at alloc809[0x0], but that tag only grants SharedReadOnly permission for this location
 --> src/main.rs:9:27
  |
9 |     let _c_ref = unsafe { &*(u_ref as *const u64 as *const Foo) };
  |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |                           |
  |                           trying to retag from <1716> for SharedReadWrite permission at alloc809[0x0], but that tag only grants SharedReadOnly permission for this location
  |                           this error occurs as part of retag at alloc809[0x0..0x8]
  |
  = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
  = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <1716> was created by a SharedReadOnly retag at offsets [0x0..0x8]
 --> src/main.rs:9:30
  |
9 |     let _c_ref = unsafe { &*(u_ref as *const u64 as *const Foo) };
  |                              ^^^^^
  = note: BACKTRACE (of the first span):
  = note: inside `main` at src/main.rs:9:27: 9:64

It's no longer unsound if the UnsafeCell region is of zero size

Status: might be able to guarantee sound

In other words, it's not the mere presence of UnsafeCell that matters, but rather the bytes that are "covered" by UnsafeCells. For this reason, ZST UnsafeCells have no effect; this code is accepted by Miri:

// `Cell<()>` is a ZST
#[repr(C)]
struct Foo(Cell<()>, u64);

let u = 1u64;
let u_ref = &u;
let _c_ref = unsafe { &*(u_ref as *const u64 as *const Foo) };

It is sound to have multiple references disagree about what type a memory region has so long as those references agree on UnsafeCells

Status: probably guaranteed sound

For example, in both of the following cases, while the types are not the same, the byte ranges covered by UnsafeCells are the same.

let u = 1usize;
let u_ref = &u;
// Convert from `&usize` to `&isize`. Neither type has `UnsafeCell`s.
let _i_ref = unsafe { &*(u_ref as *const usize as *const isize) };
    
let cu = Cell::new(1usize);
let cu_ref = &cu;
// Convert from `&Cell<usize>` to `&Cell<isize>`. Both types have
// `UnsafeCell`s covering the same byte ranges.
let _ci_ref = unsafe { &*(cu_ref as *const Cell<usize> as *const Cell<isize>) };

It is sound to have multiple references disagree about UnsafeCells so long as only one of them is "active"

Status: guaranteed sound

In this example, I believe that what's happening is that u_mut becomes inaccessible so long as _c_mut exists, so there's no way to exercise the UnsafeCell disagreement. Not only is this accepted by Miri in practice, it's also codified in the UnsafeCell methods get_mut (stable) and from_mut (unstable nightly feature).

let mut u = 1usize;
let u_mut = &mut u;
let _c_mut = unsafe { &mut *(u_mut as *mut usize as *mut Cell<usize>) };

It is sound to read a non-UnsafeCell location as an UnsafeCell via ptr::read

Status: unclear

I'm not sure how to think about why this is sound, and whether it's guaranteed to remain sound.

let u = 1usize;
let u_ref = &u;
let _c = unsafe { ptr::read(u_ref as *const usize as *const Cell<usize>) };

It is sound to read an UnsafeCell location as a non-UnsafeCell via ptr::read

Status: unclear

I'm even less sure how to think about this one - my intuition tells me that Miri should reject this, but it doesn't.

let c = Cell::new(1usize);
let c_ref = &c;
let _u = unsafe { ptr::read(c_ref as *const Cell<usize> as *const usize) };

Guarantees

I would like to be able to rely on the following guarantees. For each guarantee, my questions are:

  • Is this already documented somewhere as being guaranteed?
  • If not, would it be acceptable (at least to T-opsem) to add documentation to guarantee it?

Sound given UnsafeCell agreement

In particular: it is sound to create references to the same memory which disagree about type so long as they agree about which bytes are covered by UnsafeCell. It is sound to use such references to perform loads and stores (assuming those loads and stores do not violate the referent types' bit validity).

These analyses only consider active references

For the purposes of the previous soundness guarantee, only "active" (i.e., non-shadowed) references are considered.* This allows us to prove that code which reborrows a &mut T (with no UnsafeCells) as a &mut U (with UnsafeCells) or vice versa (e.g., UnsafeCell::get_mut) is sound.

I am aware that the formal definition of "active-ness" is up in the air, and subject to models like stacked borrows and tree borrows. I am assuming that there is some way we can formally define borrows so that we can get some minimum guarantees about this section despite not having completely agreed on the full model.

It is sound to read a non-UnsafeCell location as an UnsafeCell via ptr::read

Given &T where T doesn't contain any UnsafeCells, it is sound to convert &T to *const U (where U does contain UnsafeCells) and use ptr::read to read a U out of that memory location (assuming that this doesn't violate U's bit validity, read uninitialized bytes, read unaligned memory, etc etc).

Note: I think you're using "live" when you actually mean something closer to "active," which obfuscates the point you're trying to make. "Live" has a specific meaning used by the NLL RFC; specifically, that the reference is potentially read from read through in the future; that the borrow region/lifetime is live.

Thus in both the &T as &Cell<T> and &mut T as &Cell<T> cases, the parent borrow is still "live". &T remains "active" while derived borrows are "live" (dereferencing it does not end the liveness of derived borrows), whereas &mut T is not "active" while derived borrows are "live" (dereferencing it ends the liveness of derived borrows).


The unifying rule is that a byte may be borrowed as UnsafeCell only if the borrow of the byte is permitted to mutate the byte.

As for reads, I'm reasonably sure we've documented somewhere that transmute/transmute_copy/ptr.cast().read() only requires layout/value compatibility (thus UnsafeCell is completely irrelevant).

I do think all cases of ref casting used in std amount to #[repr(transparent)] compatibility (if we treat str as a transparent newtype around [u8], which it essentially is except for compile perf benefit). Ref casting between #[repr(transparent)] compatible types is essentially in the guaranteed sound category, as it's how zero copy conversions between &str/&CStr/&Path/&[u8] work. Other layout compatibility doesn't have a "soundness witness" for ref casting in the std API/documentation, as far as I'm aware.

Note: I think you're using "live" when you actually mean something closer to "active," which obfuscates the point you're trying to make. "Live" has a specific meaning used by the NLL RFC; specifically, that the reference is potentially read from read through in the future; that the borrow region/lifetime is live.

Thus in both the &T as &Cell<T> and &mut T as &Cell<T> cases, the parent borrow is still "live". &T remains "active" while derived borrows are "live" (dereferencing it does not end the liveness of derived borrows), whereas &mut T is not "active" while derived borrows are "live" (dereferencing it ends the liveness of derived borrows).

Yep, sounds like I mean "active"! I've updated the original text to use that term.

The unifying rule is that a byte may be borrowed as UnsafeCell only if the borrow of the byte is permitted to mutate the byte.

Do you know if a rule like this is documented anywhere normative (ie, not in not-yet-accepted proposals like stacked borrows/tree borrows)? IIUC, you're using the term "permitted" in the same sense that Miri uses it when tracking "permissions" on an allocation (e.g., treating &UnsafeCell<T> as having SharedReadWrite permission). I'm not aware of a term like that being used in official Rust documentation.

Also, is this an "if and only if" or merely an "only if"? In other words, is it the case that "I am permitted to mutate this byte" is sufficient to guarantee that "I am allowed to borrow this byte as UnsafeCell"?

Finally, one concern with this rule: Doesn't this imply that going from &UnsafeCell<T> to &T is sound since the initial &UnsafeCell<T> is permitted to mutate its referent (no problems so far), and the subsequent re-borrow as &T doesn't have any UnsafeCells, and so your rule doesn't have anything to say about it? This seems problematic - it would mean that I could, for example, synthesize a &usize that refers to the inner contents of an AtomicUsize and then access it while another thread performs atomic store operations on it.

As for reads, I'm reasonably sure we've documented somewhere that transmute/transmute_copy/ptr.cast().read() only requires layout/value compatibility (thus UnsafeCell is completely irrelevant).

My concern with this case is that it seems possible to make this unsound. For example, imagine that I ptr::read from the UnsafeCell inside an AtomicUsize while another thread is performing atomic store operations on it. Isn't that UB? Should I read your statement as implying that this shouldn't be UB?

you're using the term "permitted" ... I'm not aware of a term like that being used in official Rust documentation.

Here I'm use the term mostly informally to include both mutable references and shared references to what the UnsafeCell docs call "types that allow internal mutability." I'm also trying to be careful enough that I don't mis-state UnsafeCell-adjacency as one way or the other, since that's still formally undecided.

Also, is this an "if and only if" or merely an "only if"? ... Doesn't this imply that going from &UnsafeCell to &T is sound

I carefully worded my rule to only apply to as &UnsafeCell<T> cases, and only to define a set where it is unambiguously unsound, i.e. the only logical implication I'm asserting as demonstrably true is $\text{(byte would be UB to mutate)} \implies \text{(shared reborrowing as \texttt{UnsafeCell} is unsound)}$.

The informal implication is only that &UnsafeCell<T> -> &T can be sound, and RefCell is an API witness to that being true. (Strictly speaking, Mutex isn't, since it could utilize allocated indirection instead of UnsafeCell.) If external synchronization of an UnsafeCell isn't consistent, that's obviously still unsound.

For example, imagine that I ptr::read from the UnsafeCell inside an AtomicUsize while another thread is performing atomic store operations

That's defined as UB, because it causes a data race, and data races are defined as UB. The exact same as if there were never a reference created, one thread did ptr::read, and another did ptr::write. That the only way to stably do atomic writes is via &AtomicX is irrelevant, as all that matters for this to be UB is that it's a data race.

you're using the term "permitted" ... I'm not aware of a term like that being used in official Rust documentation.

Here I'm use the term mostly informally to include both mutable references and shared references to what the UnsafeCell docs call "types that allow internal mutability." I'm also trying to be careful enough that I don't mis-state UnsafeCell-adjacency as one way or the other, since that's still formally undecided.

Also, is this an "if and only if" or merely an "only if"? ... Doesn't this imply that going from &UnsafeCell to &T is sound

I carefully worded my rule to only apply to as &UnsafeCell<T> cases, and only to define a set where it is unambiguously unsound, i.e. the only logical implication I'm asserting as demonstrably true is $\text{(byte would be UB to mutate)} \implies \text{(shared reborrowing as \texttt{UnsafeCell} is unsound)}$.

Ah I see. Unfortunately what we need is some set of behavior which is guaranteed sound (so that we can use it as the precondition of a safety proof).

To make this more concrete: If someone could demonstrate a proof that get_mut and from_mut are sound, that would be very helpful. Our use case is basically just doing the same thing that those methods do, except with raw pointers (and so we can't literally use those methods). Given a proof of soundness for those methods, we could nearly copy-paste it into our code.

Alternatively, if it's not possible to write such a proof (because the necessary preconditions aren't guaranteed by the Rust Reference or by the standard library documentation), we'd like to know what the gaps are so we can work on adding them as guarantees to the language.

Note that the std docs explicitly call out that it is considered unsound to project from shared UnsafeCell<T> to unique T by pointer cast in user code, and the use of UnsafeCell::raw_get is required.

That could imply that the same holds for from_mut.

More generally speaking, std API can only be witness to that something is possible, but std could always have magical instructions not replicable by user code.

Note that the std docs explicitly call out that it is considered unsound to project from shared UnsafeCell<T> to unique T by pointer cast in user code, and the use of UnsafeCell::raw_get is required.

That could imply that the same holds for from_mut.

More generally speaking, std API can only be witness to that something is possible, but std could always have magical instructions not replicable by user code.

Definitely agreed. But do you believe that it's not possible to prove sound given any of the other guarantees provided by the language? If so, would opsem be comfortable adding the necessary guarantees to make it possible to complete such a proof?

cc @RalfJung ; I assume you may have thoughts here

Yeah I may; I also have quite a big backlog of Rust things so I can't say when I'll have time to take a look at this thread.

an omnibus issue

Note sure if that was a good idea, it's usually better to keep one-topic-per-thread/issue...

It's unsound (under stacked borrows) to have multiple active references which "disagree" about whether a memory region is covered by an UnsafeCell

I consider that a Stacked Borrows bug that must be fixed for the official aliasing model. Mixing &T and &UnsafeCell<T> references to the same memory should be entirely fine, as long as there is no mutation while any &T is live/active (or T itself has interior mutability).

It is sound to have multiple references disagree about what type a memory region has so long as those references agree on UnsafeCells

Rust does not do C-style TBAA. So yes I consider that pretty much settled. (Not just for UnsafeCell but for all references.)

It is sound to have multiple references disagree about UnsafeCells so long as only one of them is "active"

Note on that example: _c_mut is derived from u_mut. That matters a lot. Having a mutable reference and any other reference active at the same time is unsound unless they are in a parent-child relationship. (In Tree Borrows terms: mutable references get invalidated on foreign writes, but not on child writes.)

It is sound to read a non-UnsafeCell location as an UnsafeCell via ptr::read

You're only using the UnsafeCell by-value here, so T and UnsafeCell<T> are entirely equivalent.

It is sound to read an UnsafeCell location as a non-UnsafeCell via ptr::read

Same as above.


Sound given UnsafeCell agreement

That's basically "Rust has no TBAA except for its reference guarantees". I don't know if this is officially documented somewhere. I don't even know how to document this. A list of "all the UB Rust does not have" could become quite long. ;)

Note that on top of what you said here, it is obviously also crucial to not mutate any of the parts that an active shared reference points to outside of an UnsafeCell! And you cannot violate uniqueness of &mut references. And data race rules apply as well. Your accesses also still have to be in-bounds and the allocation cannot be freed -- all the usual. (Some of your later questions here seemed related to a lack of stating all these requirements here).

These analyses only consider active references

I am confused by your / @CAD97's use of the term "active" here. I would say a mutable reference that has been reborrowed is still active. It just has a child now. It becomes inactive when it gets invalidated y a conflicting access.

It is sound to read a non-UnsafeCell location as an UnsafeCell via ptr::read

I would say that falls out of the fact that UnsafeCell is repr(transparent).


To make this more concrete: If someone could demonstrate a proof that get_mut and from_mut are sound, that would be very helpful.

They are trivially sound because UnsafeCell only makes a difference under &. So &mut UnsafeCell<T> and &mut T are identical types -- same safety invariant, same validity invariant. (They are only identical in owned position though, not in shared position.)

On "active:" terms are hard, and there aren't enough words. NLL uses "live" for roughly "potential access later in the CFG" and TB uses "active" for roughly "hasn't been invalidated, would be DB to write through." There isn't a good word for the stronger property of "leaf active," roughly "spurious operations are acceptable and don't change the tag state."

In any case, while the disclaimer stating that UnsafeCell::raw_get is required remains, std is saying that you mustn't "project into" *const UnsafeCell<T> except by calling the UnsafeCell API. The other direction of &mut T to &mut UnsafeCell<T> is explicitly allowed by that same documentation.

There isn't a good word for the stronger property of "leaf active," roughly "spurious operations are acceptable and don't change the tag state."

I also didn't really need such a word before, so this never seemed like a problem to me.

For the "disagree about where the UnsafeCell are" question, I think "active (including possibly with children)" is the notion we want, isn't it?

Though I think I'd put it more precisely as: it is UB to derive &U from &T when U has UnsafeCell anywhere that T does not. That better reflects the fact that this is asymmetric -- deriving &T from &UnsafeCell<T> is fine (though only via UnsafeCell::get/get_mut), but the opposite direction is not. IOW, when discussing these aliasing questions we must also always consider the parent/child relationships of the pointers involved. (Tree Borrows makes this very explicit, but Stacked Borrows largely works on that principle as well.)

In any case, while the disclaimer stating that UnsafeCell::raw_get is required remains, std is saying that you mustn't "project into" *const UnsafeCell except by calling the UnsafeCell API.

Also worth noting that this is a form of library UB -- it currently seems very likely that there will be no language UB from side-stepping get/raw_get.

Amazing, thank you for this! It'll take me some time to wrap my head around everything, but this is very helpful.

One thing that I would like to get clarity on, since it seems like the answer is "this is sound", but I'm not entirely sure that we all agree on why it's sound, or on how to prove that it's sound: See the following code:

/// This is a function exposed in a crate's public API.
///
/// # Safety
///
/// The caller promises that `T` and `U` have the same layout and bit validity.
pub unsafe fn as_mut<T, U>(u: &mut UnsafeCell<T>) -> &mut U {
    let cell_ptr: *mut UnsafeCell<T> = u;
    // SAFETY: TODO
    unsafe { &mut *cell_ptr.cast::<U>() }
}

Three questions:

  • Is this code guaranteed to be sound today? Or does it just happen to be sound based on the way that Rust currently works?
  • Is it possible to fill in the // SAFETY: TODO safety proof in such a way that it only relies on axioms documented by the Reference or standard library documentation?
  • If not, what axioms would we need to add such that it would become possible to fill it in?

At the end of the day, what we're really after is the text of that safety proof. If we can't write it, then we can't commit the relevant code to zerocopy.

The caller promises that T and U have the same layout and bit validity.

This is sufficient to avoid immediate UB, but not sufficient for soundness. Types with the same layout and bit validity can still have different safety invariants (e.g., String and Vec<u8>, or [u8] and str).

But other than that, isn't your function a straight-forward composition of UnsafeCell::get_mut and a "transmute between &mut T and &mut U when their types are equivalent"? The only tricky part is how to express when that transmute is allowed, but that question seems entirely unrelated to UnsafeCell.

You don't have to justify the soundness of UnsafeCell::get_mut, it is given to you as an axiom from the standard library. The justification isn't complicated, though -- UnsafeCell only makes a difference below &, but there's no & here, therefore the presence of UnsafeCell is entirely moot.

The caller promises that T and U have the same layout and bit validity.

This is sufficient to avoid immediate UB, but not sufficient for soundness. Types with the same layout and bit validity can still have different safety invariants (e.g., String and Vec<u8>, or [u8] and str).

Yeah, I'm ignoring safety invariants here to keep the question simple.

But other than that, isn't your function a straight-forward composition of UnsafeCell::get_mut and a "transmute between &mut T and &mut U when their types are equivalent"? The only tricky part is how to express when that transmute is allowed, but that question seems entirely unrelated to UnsafeCell.

You don't have to justify the soundness of UnsafeCell::get_mut, it is given to you as an axiom from the standard library. The justification isn't complicated, though -- UnsafeCell only makes a difference below &, but there's no & here, therefore the presence of UnsafeCell is entirely moot.

I agree that this is in principle straightforward, but:

  • The standard library is allowed to do things that rely on compiler internals or behavior which are not guaranteed to public consumers (e.g., lots of standard library code assumes the layout of fat pointers despite them not being stable - if this code were copy+pasted to an external crate, it would not be guaranteed to remain sound on future compiler versions), so the fact that UnsafeCell::get_mut exists isn't a guarantee that we're allowed to do the same thing.
  • Where is it documented that "UnsafeCell only makes a difference below &"?

I know I'm being incredibly pedantic, but my point with this exercise is that we're trying to write a soundness proof that is correct only in virtue of what's documented to the public, not in virtue of what is intended to be documented in the future, what's commonly understood to be true, etc. Often, when we go through these exercises, we discover that things that are "obviously true" are not actually documented anywhere.

If you were writing that safety comment, what would you write? My hope in asking is to discover whether sufficient axioms are already provided by the documentation, or whether we need to add things.

e.g., lots of standard library code assumes the layout of fat pointers despite them not being stable - if this code were copy+pasted to an external crate, it would not be guaranteed to remain sound on future compiler versions

It's not lots, and this is being fixed rust-lang/rust#124251. Considering how often this example is trotted out, I actually wonder if people will have another similar example after that PR is merged.

There's reason to think we will eliminate as many of these as possible over time. The sort of implicit reliance makes it hard to detect when third-party code is accidentally relying on such details.

Note that before fat pointer layout, the example everyone used was uninitialized arrays of u8, and that was also fixed.

Yeah, I'm ignoring safety invariants here to keep the question simple.
You cannot talk precisely about soundness without talking about safety invariants. That is literally the core concept that all of soundness rests on.

Since the function is unsafe, presumably it's on the caller to not violate safety invariants after calling the function? IIUC we can still talk coherently about the soundness of the function itself without needing to think about the safety invariants of T and U.