Performance regression with filtering

Question

Performance regression with filtering

rklaehn opened this issue 3 years ago · comments

Old:

git checkout a2e73a7cbc3afb07958abd74adb625cb4f91c632
cargo run bench --count 1000000
266667
6667
create 16.362254
collect 3.809701
filter_common 3.790137
filter_rare 0.510667

New:

git checkout master
cargo run bench --count 1000000
266667
6667
create 1.121043
collect 0.286476
filter_common 2.714396
filter_rare 2.592429

Something weird going on here. Collect seems suspiciously fast. filter_common should not be slower than collect, and filter_rare should be much faster than filter_common.

This needs looking in to, since filter_rare is essentially the whole point of the library. I am sure it is nothing fundamental. Maybe the bench is broken?

Rüdiger Klaehn · Answer 1 · Sun May 16 2021 17:04:17 GMT+0800 (China Standard Time)

It seems that this is somehow related to the branch cache.

I wrote a small test in banyan and did not see anything out of the ordinary. filter_rare was faster than filter_common, as expected. Then I moved the test to banyan-utils to work with a more realistic key type. Now I see filter_rare being very slow, same as filter_common:

#[test]
fn ops_count_1() -> anyhow::Result<()> {
    let n = 1000000;
    let capacity = 0;
    let xs = (0..n)
        .map(|i| (Key::single(i, i, TagSet::empty()), i))
        .collect::<Vec<_>>();
    let store = MemStore::new(usize::max_value(), Sha256Digest::digest);
    let store = OpsCountingStore::new(store);
    let branch_cache = BranchCache::<TT>::new(capacity);
    let txn = Transaction::new(
        Forest::new(store.clone(), branch_cache.clone()),
        store.clone(),
    );
    let mut builder = StreamBuilder::new(Config::debug_fast(), Secrets::default());
    txn.extend(&mut builder, xs)?;
    let tree = builder.snapshot();

    let t0 = Instant::now();
    let r0 = store.reads();
    let xs1 = txn.collect(&tree)?;
    let r_collect = store.reads() - r0;
    let t_collect = t0.elapsed();

    let t0 = Instant::now();
    let r0 = store.reads();
    let xs2: Vec<_> = txn.iter_filtered(&builder.snapshot(), AllQuery).collect();
    let r_iter = store.reads() - r0;
    let t_iter = t0.elapsed();

    let t0 = Instant::now();
    let r0 = store.reads();
    let xs3: Vec<_> = txn
        .iter_filtered(&builder.snapshot(), OffsetRangeQuery::from(0..n / 10))
        .collect();
    let r_iter_10 = store.reads() - r0;
    let t_iter_10 = t0.elapsed();

    assert!(xs1.len() as u64 == n);
    assert!(xs2.len() as u64 == n);
    assert!(xs3.len() as u64 == n / 10);

    println!("{} {} {}", r_collect, r_iter, r_iter_10);
    println!(
        "{} {} {}",
        t_collect.as_millis(),
        t_iter.as_millis(),
        t_iter_10.as_millis()
    );
    Ok(())
}

65 126 96
241 3614 3413

So filtering everything is 10x slower than collect. There is something wrong here...

Rüdiger Klaehn · Answer 2 · Sun May 16 2021 22:11:55 GMT+0800 (China Standard Time)

Partial fix: #84

Rüdiger Klaehn · Answer 3 · Mon May 17 2021 13:40:01 GMT+0800 (China Standard Time)

There was definitely something wrong, but a big part of the problem was that the default impl of estimated_size for CompactSeq returned some very large values, so you needed a rather big branch cache for the cache to work at all.

Oliver Wangler · Answer 4 · Mon May 17 2021 14:56:36 GMT+0800 (China Standard Time)

What else you think was wrong?

Rüdiger Klaehn · Answer 5 · Tue May 18 2021 14:32:53 GMT+0800 (China Standard Time)

Sometimes leafs were being loaded even though they were not needed. The NodeInfo is convenient, but also eagerly loads the "payload" part despite it sometimes not being needed. That is fixed now, the iterator does not use NodeInfo anymore but loads the branch/leaf at the last possible moment.