Histogram - ability to take NaNs into account in norm
ItamarShDev opened this issue · comments
hi @ItamarShDev,
this is indeed not a nice behaviour. The question is how one should design a solution. One could simply:
- raise and exception if NaNs are contained in the dataframe because the data is ill-defined for such an analysis
- just ignore the NaNs in the analysis
What do you think about this?
We have a 3rd case where NaNs are part of the dataset, and thus needs to be taken into account.
So if we have [0,0,NaN,1,1,2] in the data, the normalized histogram would show:
Value | 0 | 1 | 2 |
---|---|---|---|
Percentage | 1/3 | 1/3 | 1/6 |
where now it is completely ignores the NaNs and show:
Value | 0 | 1 | 2 |
---|---|---|---|
Percentage | 2/5 | 2/5 | 1/5 |
Why this is a real life case for me:
We show data histogram and shade it with another histogram by condition, where everything that computes to False in the condition is NaN
Example:
Data: [0,0,NaN,1,1,2]
Rule: x > 0
Data histogram
Value | 0 | 1 | 2 |
---|---|---|---|
Percentage | 1/3 | 1/3 | 1/6 |
Rule shading histogram
Value | 0 | 1 | 2 |
---|---|---|---|
Percentage | 0/6 | 1/3 | 1/5 |
If i go with current implementation i will get:
Data histogram
Value | 0 | 1 | 2 |
---|---|---|---|
Percentage | 2/5 | 2/5 | 1/5 |
Rule shading histogram
Value | 0 | 1 | 2 |
---|---|---|---|
Percentage | 0/3 | 2/3 | 1/3 |
As you can see, the proportion are change which creates the false impression that ignoring data improved the result....