statrs-dev / statrs

Statistical computation library for Rust

Home Page:https://docs.rs/statrs/latest/statrs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Multinomial sampling is very slow

jamaltas opened this issue · comments

Looking into a program I wrote I found the limiting factor was the multinomial sampling:

let mut rng = SmallRng::from_entropy();
let result = Multinomial::new(&weights, 100000).unwrap();
let counts = result.sample(&mut rng);

Specifically the sampling portion of the above code, the SmallRng call is small in comparison.

I rewrote this particular portion of my code in python using numpy.random.multinomial and found an ~400x speed increase. It appears the C code numpy calls on uses an implementation that chains many binomial calls together, whereas the statsrs implementation uses a cdf.

Wonder if there's any plans to change this?

Yep. It appears the compiled C that numpy uses employs an algorithm known at BTPE which is significantly faster binomial sampler when p*n > 30. Which is my use case.

Is there any interest in an implementation of this algorithm?

I think this could be useful, but I don't have the background for it.

I did notice that rand_distr::Binomial uses the BTPE algorithm. Would you know how to extend BTPE for multinomial from an implementation for binomial?

More broadly, perhaps we should expose the rand_distrs versions of sample when available for performance. It relies only on num_traits and it's in our dependency tree from nalgebra