Median has inconsistent behaviour with NaN (sometimes panic, sometimes wrong result)
theHausdorffMetric opened this issue · comments
Two examples of f64 Data with NaN. Depending on position of NaN, median either panics or delivers a wrong result.
use statrs::statistics::{Data, Median};
fn main() {
let x = [f64::NAN, -3.0, 0.0, 3.0, -2.0];
let x = Data::new(x);
dbg!(x.clone());
dbg!(x.median());
println!("The median should panic/return NaN or behave as if NaNs are dropped. In which case the result should be -1.0 instead of -2.0");
let x = [0.0, f64::NAN, 3.0, -2.0];
let x = Data::new(x);
dbg!(x.clone());
println!("If the NaN is in the second postion, median panics.");
dbg!(x.median());
}
We should probably return NaN instead of panicking or returning a wrong result.
I think this might happen with all order statistics (at least lower_quartile and upper_quartile in my usage).
numpy
has functions np.nan[statistic]
, one to ignore NaN and the other to emit NaN.
We can follow that and do similar for quartiles and will emit Option::None
instead of NaN when data is empty.
Making the change breaks API (but makes it match the docs), so the other option would be to implement a StatisticsNan
that emits Option
with the old trait panicing if there's NaN or match what we have for <_ as statistics::Median>::median -> Option<T>
Think I'll make it Option. Also considering the value of associated type since it's a returned value.