RoaringBitmap / roaring-rs

A better compressed bitset in Rust

Home Page:https://docs.rs/roaring/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`MultiOps` trait example could be improved

FrankReh opened this issue · comments

Thanks for this crate. Just wanted to spare others from wondering about the example given for the MultiOps trait.

roaring-rs/src/lib.rs

Lines 62 to 79 in 946bbfd

/// # Examples
/// ```
/// use roaring::{MultiOps, RoaringBitmap};
///
/// let bitmaps = [
/// RoaringBitmap::from_iter(0..10),
/// RoaringBitmap::from_iter(10..20),
/// RoaringBitmap::from_iter(20..30),
/// ];
///
/// // Stop doing this
/// let naive = bitmaps.clone().into_iter().reduce(|a, b| a | b).unwrap_or_default();
///
/// // And start doing this instead, it will be much faster!
/// let iter = bitmaps.union();
///
/// assert_eq!(naive, iter);
/// ```

The functional "naive" way to combine RoaringBitmaps is indeed slow. But that's because the reduce closure forces the bitmaps to be cloned. There is a straightforward way to use fold that accomplishes essentially the same thing that avoids cloning and at least for the example data given, is faster than the MutliOps union on my machines.

fn naive_union(bitmaps: &[roaring::RoaringBitmap]) -> roaring::RoaringBitmap {
    bitmaps
        .iter()
        .fold(roaring::RoaringBitmap::new(), |acc: RoaringBitmap, bitmap: &RoaringBitmap| acc | bitmap)
}

And this doesn't consume the array of bitmaps which may or may not be important to some also.

There are still great reasons to prefer the MultiOps version at times. For one: when the slice elements can be consumed and a bitmap might be very large to begin with. Then the fact that a large bitmap can be used as the base for the result is a clear win over a folding method where a very large bitmap would be rebuilt, only to have the original then dropped. (Changing the 2nd bitmap from a size of 10 to a size of 100,000 made this very clear to me.)