datadesk / census-data-aggregator

Combine U.S. census data responsibly

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Correct handling of jam values in median approximation

sastoudt opened this issue · comments

Thanks to some clarification from our Census friends:

The jam value represents a result from a median calculation when the median can't actually be calculated because it lies in the lowest or highest bin. The jam value is not used in the median calculation itself as a lower or upper bound for the end bins.

This information doesn't impact the calculations of the examples we have now (we've treated the jam value as a bound), but we need to update the median function to handle the scenario where the lower and upper bins don't have concrete bounds (plus add examples of this scenario).

We may want to include an optional input jam_value to use in the case that the median occurs in the highest/lowest bin.

proof of concept here

I think that goes right here

Might be good to throw a warning when this happens too.

dealt with in #20