Histogram - ability to take NaNs into account in norm

Question

Histogram - ability to take NaNs into account in norm

ItamarShDev opened this issue 3 years ago · comments

When using Histogram on a df with NaNs, it seems that norm does not take into account the NaNs in the percentage.

In the following image you can see that the bin 40.75-41.05 is 2.403315%.
If we took the NaNs into account we would have it at 1.74%

Patrik Hlobil · Answer 1 · Wed Dec 08 2021 21:08:13 GMT+0800 (China Standard Time)

hi @ItamarShDev,

this is indeed not a nice behaviour. The question is how one should design a solution. One could simply:

raise and exception if NaNs are contained in the dataframe because the data is ill-defined for such an analysis
just ignore the NaNs in the analysis

What do you think about this?

itamar sharify · Answer 2 · Sun Dec 19 2021 20:30:56 GMT+0800 (China Standard Time)

We have a 3rd case where NaNs are part of the dataset, and thus needs to be taken into account.
So if we have [0,0,NaN,1,1,2] in the data, the normalized histogram would show:

Value	0	1	2
Percentage	1/3	1/3	1/6

where now it is completely ignores the NaNs and show:

Value	0	1	2
Percentage	2/5	2/5	1/5

itamar sharify · Answer 3 · Sun Dec 19 2021 20:38:59 GMT+0800 (China Standard Time)

Why this is a real life case for me:

We show data histogram and shade it with another histogram by condition, where everything that computes to False in the condition is NaN

Example:

Data: [0,0,NaN,1,1,2]
Rule: x > 0

Data histogram

Value	0	1	2
Percentage	1/3	1/3	1/6

Rule shading histogram

Value	0	1	2
Percentage	0/6	1/3	1/5

If i go with current implementation i will get:

Data histogram

Value	0	1	2
Percentage	2/5	2/5	1/5

Rule shading histogram

Value	0	1	2
Percentage	0/3	2/3	1/3

As you can see, the proportion are change which creates the false impression that ignoring data improved the result....