Median has inconsistent behaviour with NaN (sometimes panic, sometimes wrong result)

Question

Median has inconsistent behaviour with NaN (sometimes panic, sometimes wrong result)

theHausdorffMetric opened this issue 2 years ago · comments

theHausdorffMetric commented 2 years ago

Two examples of f64 Data with NaN. Depending on position of NaN, median either panics or delivers a wrong result.

use statrs::statistics::{Data, Median};
fn main() {
    let x = [f64::NAN, -3.0, 0.0, 3.0, -2.0];
    let x = Data::new(x);
    dbg!(x.clone());
    dbg!(x.median());
    println!("The median should panic/return NaN or behave as if NaNs are dropped. In which case the result should be -1.0 instead of -2.0");
    let x = [0.0, f64::NAN, 3.0, -2.0];
    let x = Data::new(x);
    dbg!(x.clone());
    println!("If the NaN is in the second postion, median panics.");
    dbg!(x.median());
}

Vinzent Steinberg · Answer 1 · Sun Jan 02 2022 21:01:28 GMT+0800 (China Standard Time)

We should probably return NaN instead of panicking or returning a wrong result.

James McKinney · Answer 2 · Thu Mar 02 2023 07:20:51 GMT+0800 (China Standard Time)

I think this might happen with all order statistics (at least lower_quartile and upper_quartile in my usage).

Orion Yeung · Answer 3 · Mon May 06 2024 06:51:16 GMT+0800 (China Standard Time)

numpy has functions np.nan[statistic], one to ignore NaN and the other to emit NaN.

We can follow that and do similar for quartiles and will emit Option::None instead of NaN when data is empty.

Making the change breaks API (but makes it match the docs), so the other option would be to implement a StatisticsNan that emits Option with the old trait panicing if there's NaN or match what we have for <_ as statistics::Median>::median -> Option<T>

Think I'll make it Option. Also considering the value of associated type since it's a returned value.