py-why / dowhy

DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.

Home Page:https://www.pywhy.org/dowhy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CausalEstimator reporting a 90% instead of 95% confidence interval for bootstrapping?

YichenTang97 opened this issue · comments

Problem description
The current implementation of _estimate_confidence_intervals_with_bootstrap method under the CausalEstimator class might be reporting a 90% confidence interval (CI) instead of the 95% CI under the default confidence_level setting of 0.95.

The current implementation for obtaining the CI seem to follow the Pivotal Intervals method (see section 8.3 of [1], and section 6 of the reading material refered in the code comment [2]). Given $x_1, x_2, . . . , x_n$ as the observed sample with size N drawn from a distribution $F$ and $\bar{x}$ as the observed sample mean. Let's denote $x_1^*, x_2^*, . . . , x_n^*$ as a resample of the data of the same size N, and $\bar{x}^*$ as the mean of this resample. One can estimate the CI of significance level $\alpha$ (usually 0.05) as such:

$$CI = (\bar{x} - \delta^*_{1-\alpha/2}, \bar{x} - \delta^*_{\alpha/2}),$$

where $\delta^* = \bar{x}^* - \bar{x}$ is the distribution of the sample mean differences for some bootstrap resamples, and $\delta^*_i$ denotes the $100 \cdot i$ th percentile of $\delta^*$.

For a significance level $\alpha=0.05$ (i.e 95% CI), we should find the 2.5 th percentile and 97.5 th percentile such that $CI = (\bar{x} - \delta^*_{0.975}, \bar{x} - \delta^*_{0.025})$. However, in the current implementation, the _estimate_confidence_intervals_with_bootstrap method is returning $CI = (\bar{x} - \delta^*_{0.95}, \bar{x} - \delta^*_{0.05})$ for confidence_level=0.95, which in fact reports the 90% CI.

Could someone investigate into this and make changes if necessory? It would also be helpful to implement an option for choosing the computing method for CI (e.g. pivotal, percentile, normal, etc.).

Version information

  • DoWhy v0.10.1

References
[1] L. Wasserman, All of statistics: a concise course in statistical inference, vol. 26. Springer, 2004.
[2] Reading 24 Bootstrap Confidence Intervals (https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading24.pdf)

This issue is stale because it has been open for 14 days with no activity.

This issue was closed because it has been inactive for 7 days since being marked as stale.

thanks for raising this @YichenTang97 will take a look

This issue is stale because it has been open for 14 days with no activity.

This issue was closed because it has been inactive for 7 days since being marked as stale.