meilisearch / arroy

Spotify/Annoy-inspired Approximate Nearest Neighbors in Rust, based on LMDB and optimized for memory usage :boom:

Home Page:https://docs.rs/arroy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Measure and improve the constant numbers used when building the tree

Kerollmops opened this issue · comments

We must take three parameters into account:

  1. Time to build the tree
  2. Relevancy of the searches
  3. Time to search in the tree

Fun fact: the lowest in the tree you are, the less impact a dummy plane has on the search cost.


arroy/src/writer.rs

Lines 248 to 259 in 7fc6031

if split_imbalance(children_left.len(), children_right.len()) < 0.95
|| remaining_attempts == 0
{
break normal;
}
remaining_attempts -= 1;
};
// If we didn't find a hyperplane, just randomize sides as a last option
// and set the split plane to zero as a dummy plane.
while split_imbalance(children_left.len(), children_right.len()) > 0.99 {

fn split_imbalance(left_indices_len: usize, right_indices_len: usize) -> f64 {
    let ls = left_indices_len as f64;
    let rs = right_indices_len as f64;
    let f = ls / (ls + rs + f64::EPSILON); // Avoid 0/0
    f.max(1.0 - f)
}

fn main() {
    dbg!(split_imbalance(29464, 18394));
    dbg!(split_imbalance(30000, 30000));
    dbg!(split_imbalance(30000, 1580));
}