jonhoo / flurry

A port of Java's ConcurrentHashMap to Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Finish the safety argument for `BinEntry::Moved`

jonhoo opened this issue · comments

The safety argument for why dereferencing the pointer in BinEntry::Moved has not been completed. It currently stands with a FIXME in case three here

flurry/src/lib.rs

Lines 474 to 492 in d3dae04

// safety: table is a valid pointer.
//
// we are in one of three cases:
//
// 1. if table is the one we read before the loop, then we read it while holding the
// guard, so it won't be dropped until after we drop that guard b/c the drop logic
// only queues a drop for the next epoch after removing the table.
//
// 2. if table is read by init_table, then either we did a load, and the argument is
// as for point 1. or, we allocated a table, in which case the earliest it can be
// deallocated is in the next epoch. we are holding up the epoch by holding the
// guard, so this deref is safe.
//
// 3. if table is set by a Moved node (below) through help_transfer, it will _either_
// keep using `table` (which is fine by 1. and 2.), or use the `next_table` raw
// pointer from inside the Moved. how do we know that that is safe?
//
// we must demonstrate that if a Moved(t) is _read_, then t must still be valid.
// FIXME

and case two here

flurry/src/lib.rs

Lines 1060 to 1072 in 6d56c58

// safety: table is a valid pointer.
//
// we are in one of two cases:
//
// 1. if table is the one we read before the loop, then we read it while holding the
// guard, so it won't be dropped until after we drop that guard b/c the drop logic
// only queues a drop for the next epoch after removing the table.
//
// 2. if table is set by a Moved node (below) through help_transfer, it will use the
// `next_table` raw pointer from inside the Moved. how do we know that that is safe?
//
// we must demonstrate that if a Moved(t) is _read_, then t must still be valid.
// FIXME cf. put

commented

I'm thinking as follows: If we perform an operation that sees a Moved(t), we must have gotten there by loading self.table with a guard g. This guard g will pin an epoch before the resize is finished with respect to the loaded table, otherwise we would have loaded the next table. This argument also exists here:

flurry/src/map.rs

Lines 719 to 749 in 051ca79

if finishing {
// this branch is only taken for one thread partaking in the resize!
self.next_table.store(Shared::null(), Ordering::SeqCst);
let now_garbage = self.table.swap(next_table, Ordering::SeqCst, guard);
// safety: need to guarantee that now_garbage is no longer reachable. more
// specifically, no thread that executes _after_ this line can ever get a
// reference to now_garbage.
//
// first, we need to argue that there is no _other_ way to get to now_garbage.
//
// - it is _not_ accessible through self.table any more
// - it is _not_ accessible through self.next_table any more
// - what about forwarding nodes (BinEntry::Moved)?
// the only BinEntry::Moved that point to now_garbage, are the ones in
// _previous_ tables. to get to those previous tables, one must ultimately
// have arrived through self.table (because that's where all operations
// start their search). since self.table has now changed, only "old" threads
// can still be accessing them. no new thread can get to past tables, and
// therefore they also cannot get to ::Moved that point to now_garbage, so
// we're fine.
//
// this means that no _future_ thread (i.e., in a later epoch where the value
// may be freed) can get a reference to now_garbage.
//
// next, let's talk about threads with _existing_ references to now_garbage.
// such a thread must have gotten that reference before the call to swap.
// because of this, that thread must be pinned to an epoch <= the epoch of our
// guard (since our guard is pinning the epoch). since the garbage is placed in
// our epoch, it won't be freed until the _next_ epoch, at which point, that
// thread must have dropped its guard, and with it, any reference to the value.
unsafe { guard.defer_destroy(now_garbage) };

  • During the resize, t points to next_table and is thus valid.
  • After the resize, t == self.table, thus t is valid.
  • This holds until a subsequent resize ends, at which point self.table != t and t is defer_destroyed (see the code above). At this point, t is not referenced by the map anymore. However, the defer_destroy happens while g is still pinning the epoch. Thus, t remains valid for at least the lifetime of g.
  • After releasing g, performing operations on the map with a new guard g' cannot access the table anymore (also see above).

Since this traces the entire lifetime of both the guard and the table and in particular, the above code is the only place that defer_destroys tables, I think the argument is sound. What do you think? Does this make sense to you?

Yes, that seems correct to me as well! Could you submit a PR with that argument included? I think maybe we want to make that argument as a comment on the Moved variant of BinEntry (rather than in everywhere that uses it), and then reference that argument from the two or three places that rely on it.

commented

Nice! I just created #35 for this. I was also already thinking about reducing redundancy with these comments, but wasn't sure where to put them yet. The definition of Moved seems like a good place!

There is also some other PR i saw which had copied the argument, I'll go find it and tell them about the PR so that we don't introduce more copies of the FIXME again 😉