statrs-dev / statrs

Statistical computation library for Rust

Home Page:https://docs.rs/statrs/latest/statrs/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Median has inconsistent behaviour with NaN (sometimes panic, sometimes wrong result)

theHausdorffMetric opened this issue · comments

Two examples of f64 Data with NaN. Depending on position of NaN, median either panics or delivers a wrong result.

use statrs::statistics::{Data, Median};
fn main() {
    let x = [f64::NAN, -3.0, 0.0, 3.0, -2.0];
    let x = Data::new(x);
    dbg!(x.clone());
    dbg!(x.median());
    println!("The median should panic/return NaN or behave as if NaNs are dropped. In which case the result should be -1.0 instead of -2.0");
    let x = [0.0, f64::NAN, 3.0, -2.0];
    let x = Data::new(x);
    dbg!(x.clone());
    println!("If the NaN is in the second postion, median panics.");
    dbg!(x.median());
}

We should probably return NaN instead of panicking or returning a wrong result.

I think this might happen with all order statistics (at least lower_quartile and upper_quartile in my usage).

numpy has functions np.nan[statistic], one to ignore NaN and the other to emit NaN.

We can follow that and do similar for quartiles and will emit Option::None instead of NaN when data is empty.

Making the change breaks API (but makes it match the docs), so the other option would be to implement a StatisticsNan that emits Option with the old trait panicing if there's NaN or match what we have for <_ as statistics::Median>::median -> Option<T>

Think I'll make it Option. Also considering the value of associated type since it's a returned value.