pbstark / Strat

Conservative and exact inference about stratified binary populations

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Efficient Exact Stratified Inference: EESI

Exact tests and confidence bounds for the population total of a binary population from a stratified simple random sample.

Wendell and Schmee (1996, https://www.tandfonline.com/doi/abs/10.1080/01621459.1996.10476950) proposed testing hypotheses about and finding confidence bounds for the population total by maximizing the $P$-value over a set of nuisance parameters---the individual stratum totals. They find the $P$-value by ordering possible outcomes based on the estimated population total: Their approach is combinatorially complex: Feller's classic "bars and stars" argument shows that there are $\binom{G+S-1}{S-1}$ ways to allocate $G$ objects among $S$ strata. (Some of those can be ruled out if $G$ exceeds the size of any stratum.) Wendell and Schmee also provided R scripts for searching for the maximum over the allocations; the scripts became computationally impractical for more than three strata.

This document introduces a different strategy, also based on maximizing the $P$-value over the nuisance parameters. However, the $P$-value is based on the "raw" multivariate hypergeometric counts, rather than on the estimated population total. A naive maximization of this $P$-value would also involve a search over a combinatorial number of possible allocations. However, no combinatorial search is necessary: the allocation that gives the largest $P$-values and corresponding confidence bounds can be constructed in order $N \log N$ operations, where $N$ is the number of items in the population. The number $S$ of strata is immaterial.

The code herein implements both the brute-force approach (enumerate all ways of allocating a given number of ones across the strata, find the $P$-value for each, and find the maximum across all allocations) and the more efficient approach, which exploits special structure of the problem.

About

Conservative and exact inference about stratified binary populations


Languages

Language:Jupyter Notebook 63.7%Language:Python 20.5%Language:R 15.8%