PatrikHlobil / Pandas-Bokeh

Bokeh Plotting Backend for Pandas and GeoPandas

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Histogram - ability to take NaNs into account in norm

ItamarShDev opened this issue · comments

When using Histogram on a df with NaNs, it seems that norm does not take into account the NaNs in the percentage.

In the following image you can see that the bin 40.75-41.05 is 2.403315%.
If we took the NaNs into account we would have it at 1.74%

image

hi @ItamarShDev,

this is indeed not a nice behaviour. The question is how one should design a solution. One could simply:

  1. raise and exception if NaNs are contained in the dataframe because the data is ill-defined for such an analysis
  2. just ignore the NaNs in the analysis

What do you think about this?

We have a 3rd case where NaNs are part of the dataset, and thus needs to be taken into account.
So if we have [0,0,NaN,1,1,2] in the data, the normalized histogram would show:

Value 0 1 2
Percentage 1/3 1/3 1/6

where now it is completely ignores the NaNs and show:

Value 0 1 2
Percentage 2/5 2/5 1/5

Why this is a real life case for me:

We show data histogram and shade it with another histogram by condition, where everything that computes to False in the condition is NaN

Example:

Data: [0,0,NaN,1,1,2]
Rule: x > 0

Data histogram

Value 0 1 2
Percentage 1/3 1/3 1/6

Rule shading histogram

Value 0 1 2
Percentage 0/6 1/3 1/5

If i go with current implementation i will get:

Data histogram

Value 0 1 2
Percentage 2/5 2/5 1/5

Rule shading histogram

Value 0 1 2
Percentage 0/3 2/3 1/3

As you can see, the proportion are change which creates the false impression that ignoring data improved the result....