mwaskom / seaborn

Statistical data visualization in Python

Home Page:https://seaborn.pydata.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Uncertainties v.s weights & Averaging over columns / coordinates.

doronbehar opened this issue · comments

Hello. You have no idea how much I enjoy using your package. It fits exactly my usage, and I can't believe that only at this stage of my project I came to know it!

I'm working with a large xarray.Dataset with N>4 coordinates which I convert .to_dataframe() in order to plot them with seaborn.lineplot. It became confusing to me when I wanted seaborn to show my calculations' uncertainty. At first, I wasn't sure even how to save that uncertainty, until I realized that you don't call it "uncertainty", but rather the weights of the data variables for the estimation, and that they should simply be saved in a separate data variable.

If I need to perform estimation, it works pretty good I suppose. However, I found that terminology choice a bit peculiar, because weights are something only proportional to each other, whereas uncertainties also have a meaning when the data is not averaged. The below formulas are the formulas I'm familiar with regarding this. Note how $\mu = x_0$ and $\sigma_\mu = \sigma_0$ are obtained if the summation is over 1 element:

$$ \mu = \frac{\sum_i (x_i/\sigma_i^2)}{\sum_i \sigma_i^{-2}}$$

$$ \sigma_\mu = 1/\sqrt{\sum_i \sigma_i^{-2}} $$

I also noticed, that if I give seaborn.lineplot a dataset.to_dataframe() with only 1 coordinate, then the weights aren't taken into account at all. I understand that I can supply a custom function to the errorbar argument. But I think it would have been much more consistent if instead of the weights argument, an uncertainties argument would have been used, and the uncertainties would have been used as error bars even if no estimation is required (because there is a single y per x).

At first, I wasn't sure even how to save that uncertainty, until I realized that you don't call it "uncertainty", but rather the weights of the data variables for the estimation, and that they should simply be saved in a separate data variable.

Hi, I think you're thinking about this slightly wrong — the weights parameter exists so that you can compute weighted mean, not to provide a measure of uncertainty.

At first, I wasn't sure even how to save that uncertainty, until I realized that you don't call it "uncertainty", but rather the weights of the data variables for the estimation, and that they should simply be saved in a separate data variable.

Hi, I think you're thinking about this slightly wrong — the weights parameter exists so that you can compute weighted mean, not to provide a measure of uncertainty.

I understood that correctly in the first place, but the way I phrased the sentence indeed implied otherwise. What I meant to say was that the closest thing related to uncertainties in seaborn is the weights parameter.

I wonder what do you think about adding an uncertainties parameter that would act as I suggested? Do you think it'd be beneficial? (Please reopen 🙏)

Sorry, there's been plenty of discussion of related topics before. I'm not open to adding this.

Sorry, there's been plenty of discussion of related topics before. I'm not open to adding this.

Could you link me to those discussions? I want to know what were the arguments for / against were.. These search results don't show discussions about the simplest case of a seaborn.lineplot...