Feature request/idea: option to plot highest-density intervals
elnjensen opened this issue · comments
Now that arviz is an optional dependence, it would be nice to have the option to easily plot the highest-density intervals (from arviz.hdi) on the histograms rather than percentiles - in the case of asymmetric distributions, I find these more informative than the current quantiles behavior (though in the limit of a symmetric distribution they would be the same).
If you think this is a useful addition, I'd be happy to write up a PR. Before I do that, though, I thought I'd ask:
- Do you think this is useful?
- If so, I'd imagine it behaving similarly to quantiles by drawing lines on the 1-D histograms, though each probability value would give two lines, for the upper and lower bounds of that interval, rather than a single line.
- It could take an iterable of probabilities to draw, or just a single probability. If multiple probabilities were given, would it make sense to have different line styles by default?
Here's a quick example showing the difference with symmetric vs. asymmetric distributions - HDI 50% interval is in green, vs. 25% and 75% quantiles.
import corner
import numpy as np
import arviz as az
import matplotlib.pyplot as plt
ndim, nsamples = 2, 20000
np.random.seed(42)
data =(np.random.randn(ndim * nsamples)).reshape([nsamples, ndim])
for samples in data, np.cos(data):
figure = corner.corner(samples, quantiles=[0.25, 0.75])
axes = np.array(figure.axes).reshape((ndim, ndim))
# Loop over the diagonal
for i in range(ndim):
ax = axes[i, i]
for prob in az.hdi(samples[:, i], hdi_prob=0.50):
ax.axvline(prob, color="g")
This would be great! Do you know if the arviz functions handle weighted samples properly? That would be crucial for this to be included. I'm probably not going to be able to get to this too soon, but I'd be happy to collaborate on a PR.
The arviz.hdi function does not have an option for including weights, so if that's crucial to include here, it would involve new code (though possibly with some borrowing of ideas and/or code from arviz). Looking at the arviz code and the algorithm for how it calculates HDI, it's not obvious to me that it lends itself easily to bringing weights in. The algorithm looks like this:
- Sort the input array of
n
samples; - Calculate the number of samples
m
that represents the desired probability fraction,m = hdi_prob * n
; - Look at all the sub-arrays of length
m
in the sorted array and find the one with the smallest difference between first and last element - that range of values is the highest-density interval.
To do this with weights, you can't follow quite the same logic because the intervals with the same weighted fractional probability wouldn't all have the same length. But maybe you could do something along these lines:
- Sort the input array of
n
samples; - Sort the array of weights in the same order (e.g. using
np.argsort
) - Calculate the cumulative sum of the normalized array of weights.
- Step through that cumulative-sum array (up until the element with a value of
1 – hdi_prob
), at each point usingnp.searchsorted
to find the endpoint of the subarray that would encompass the desired probability span from the starting point. - Find the subarray with the smallest difference between start and end values.
For step 4, I don't see any obvious way to vectorize, so it could be slow for large arrays (but the larger the probability interval is, the fewer subarrays there are to consider).
Does this logic seem reasonable? I guess the upside here is that if it's new code, it is not dependent on having arviz installed.
This does sound like the right algorithm to me, and I'd be open to including something like this. We already have a custom quantile
function
Line 388 in c0c6264
because the numpy percentile
and quantile
functions don't support weighted samples, and you'll see that the logic is similar to what you're describing (without the hard part :D). The computational scaling does sound potentially pretty bad, but it might be worth benchmarking.
An alternative would be to add a tutorial to the documentation that demonstrates this procedure using ArviZ directly (it might be worth using the overplot_lines
function) so that people could easily copy and paste rather than adding more options to corner itself. In that tutorial, we could resample the weighted points to get uniformly weighted points and then use the same procedure.