servo / rust-smallvec

"Small vector" optimization for Rust: store up to a small number of items on the stack

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

RFC: rewrite based on min_const_generics for 2.0

SimonSapin opened this issue · comments

There is now a serious proposal and plan to stabilize #![feature(min_const_generics)] in Rust 1.50: rust-lang/rust#79135. Using it here and removing the Array trait would a welcome simplification but obviously a breaking API change, so it would need to be part of a version 2.0 of the crate.

I’m wondering whether it would make sense to try a complete rewrite of smallvec instead of adapting the existing code. Some ideas:

  • The current capacity field has a misleading name and non-obvious meaning, it can represent either the capacity or the length of the vector. I feel this makes the code harder to understand and keep correct than necessary. Instead we could store a length in the 0..isize::MAX range, and pack a tag in the remaining bit.

  • The dual representation with conditional compilation can be skipped/removed entirely, as unions with ManuallyDrop<T> fields with T: !Copy are now stable: rust-lang/rust#77547

  • The "core" of the crate can be rather small (and in a separate module and file), with everything else mostly copied from vec.rs and raw_vec.rs from the standard library

  • Copying from the standard library again gets us more recent improvements, such as making some code less generic to reduce monomorphization bloat.

  • This would be a good time to look at unsafe code

  • 2.0 can be an opportunity to remove some APIs. For example, IntoIterator for SmallVec implies an expensive move. We could omit it and document to use drain instead.

I’d consider not necessarily including everything in #183, as specialization and custom allocators are likely not gonna be stable before or soon after min_const_generics is. Adding those might not require breaking API changes, though?

I’ve started playing with this a little. Here is the core data structure:

pub struct SmallVec<Item, const INLINE_CAPACITY: usize> {
    // Safety invariants:
    // * The active union field of `storage` must be consistent with the tag of `tagged_len`.
    // * The length in `tagged_len` must not be greater than storage capacity
    // * The first `length` items of the storage must be initialized.
    tagged_len: TaggedLen,
    storage: Storage<MaybeUninit<Item>, INLINE_CAPACITY>,
}

/// Upper bit: 0 for inline storage, 1 for heap storage
/// The rest of the bits: SmallVec length = number of items initialized, at the start of storage.
pub(crate) struct TaggedLen(usize);

/// An array, either inline (of statically-fixed size) or heap-allocated (of dynamic size).
/// The tag of [`SmallVec::tagged_len`] indicates which.
/// The size of this array is the `SmallVec`’s capacity.
union Storage<T, const INLINE_SIZE: usize> {
    inline: ManuallyDrop<[T; INLINE_SIZE]>,
    heap: ManuallyDrop<Box<[T]>>,
}

The union and tagged integer wrangling as well as heap (re/de)allocations could be in a private module, with the rest of the functionality (push, drain, try_reserve, …) built on top of an abstraction such as:

impl<Item, const INLINE_CAPACITY: usize> SmallVec<Item, INLINE_CAPACITY> {
    pub fn new() -> Self {}
    pub fn with_capacity(capacity: usize) -> Self {}
    pub fn len(&self) -> usize {}
    pub unsafe fn set_len(&mut self, new_len: usize) {}
    pub(crate) fn storage(&self) -> &[MaybeUninit<Item>] {}
    pub(crate) fn storage_mut(&mut self) -> &mut [MaybeUninit<Item>] {}
    pub(crate) fn grow_to(&mut self, new_capacity: usize) -> Result<(), TryReserveError> {}
    pub(crate) fn shrink_to(&mut self, new_capacity: usize) -> Result<(), TryReserveError> {}
}

I like this plan.

I looked into some possible SmallVec optimisations yesterday. They weren't fruitful, but I thought I'd describe my findings here in case you find them useful for the 2.0 rewrite.

I was motivated by a few comments here, which described a large branch prediction penalty for spilled SmallVecs. I'd seen similar complaints before, so I wanted to investigate possible fixes.

Some data here suggests large performance improvements when using a (32-bit) cmov in place of an unpredictable branch, with minimal or zero performance loss when the branch is completely predictable.

It occurred to me that, with the SmallVec implementation you described above, the length is unchanged whether or not the array is spilled. SmallVec::deref is only branching on which value to assign to a single pointer, which is a good candidate for use of cmov.

I experimented with code which reduced all branching in SmallVec::deref to a trivial let p: *const T = if a { b } else { c }. Unfortunately, I noticed that LLVM completely refused to emit a cmov instruction when it could potentially cause an unnecessary memory access. It turns out this is a known LLVM bug.

Unless you're willing to use inline assembly behind a feature flag, this idea is probably at a dead end. You could experiment with bit-masking as a replacement for cmov, but I think that would cost at least three CPU cycles (xor, and, xor), while a mispredicted branch might cost ten or less.

its been about a year now - any movement here? min_const_generics is fully stable

@TheButlah Nope, after the burst of inspiration that led to writing up this plan I ended up spending exactly zero additional time on it. Time since is irrelevant. I still like this plan but I can’t say when or if I’ll work on it. Anyone should feel free to take over.

I ended up spending exactly zero additional time on it

Wait, that’s not quite true. It looks like I did write some proof-of-concept code the same day as this issue, and got bored at drawing the rest of the owl. It’s not even in a git repository. Here it in case anyone wants to take a look