MobileTeleSystems / Ambrosia

Ambrosia is a Python library for A/B tests design, split and result measurement

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Approximation-based Designs for Binary Data

memoryfull opened this issue · comments

Currently Ambrosia supports only simulation-based power calculations for experiments with binary outcomes (see design_binary_size ultimately referencing __helper_calc_empirical_power).

One could rely on approximations to arrive at an analytical expression for power. First, consider variance-stabilising transformation of the proportions in the control ( $p_1$ ) and the treated group ( $p_2$ ) and express power of a two-sided two-sample test for proportions as:

$$(1-\beta) = \Phi \left( \Phi^{-1}\left( \frac{\alpha}{2} \right) - 2 \left( \arcsin \sqrt{p_1} - \arcsin \sqrt{p_2} \right) \sqrt{\frac{n}{2}}\right) + \left(1 - \Phi \left( \Phi^{-1}\left( \frac{\alpha}{2} \right) - 2 \left( \arcsin \sqrt{p_1} - \arcsin \sqrt{p_2} \right) \sqrt{\frac{n}{2}}\right)\right)$$

and search for either of $<\beta,p_1,p_2,\alpha,n>$, holding the other four fixed, such that the function reaches zero.

Second, when $n$ is large enough one could rely on Normal approximations of the binomial distribution and express power of the two-sided test as

$$(1-\beta) = \Phi \left( \frac{ \sqrt{n} \left| p_1 - p_2 \right| + \Phi^{-1} \left( \frac{\alpha}{2} \right) \sqrt{ \left( p_1 + p_2 \right) \left( 1 - \left( p_1 + p_2 \right) \right)} } { \sqrt{p_1 \left( 1 - p_1 \right) + p_2 \left( 1 - p_2 \right)} }\right)$$

and perform the same search.

Let us analytically solve a problem in your 4_usage_example_binary_design.ipynb: find $n$ such that we are able to detect a 5% increase in experimental group proportion vis-à-vis the control group proportion of 5% with type-I error of 5% and type-II error of 20%. In R parlance the solution is:

effect <- 1.05

p1 <- 0.05
p2 <- 0.05*effect
sig.level <- 0.05
power <- 0.8
tol <- .Machine$double.eps^0.25

# Variance-stabilising transformation
h <- 2 * asin(sqrt(p1)) - 2 * asin(sqrt(p2))

p.asin <- quote({pnorm(qnorm(sig.level/2, lower = F) - h * sqrt(n/2), lower = F) + pnorm(qnorm(sig.level/2, lower = T) - h * sqrt(n/2), lower = T)})

# Normal approximation of the binomial distribution
p.normal <- quote(pnorm((sqrt(n) * abs(p1 - p2) - (qnorm(sig.level/2, lower.tail = F) * sqrt((p1 + p2) * (1 - (p1 + p2)/2))))/sqrt(p1 * (1 - p1) + p2 * (1 - p2))))

# Solve for n
n.asin <- stats::uniroot(function(n) eval(p.asin) - power, c(2 + 1e-10, 1e+09))$root

n.normal <- stats::uniroot(function(n) eval(p.normal) - power, c(2 + 1e-10, 1e+09))$root

# What is n to achieve the MDE of interest under two approximations?
n.asin # 122106.8
n.normal # 122123.5

This is a self-contained solution that could be easily translated into Python. It is taken from the existing routines:

# Variance stabilising transformation-based
pwr::pwr.2p.test(h = ES.h(0.05, 0.05*effect), power = 0.8, sig.level = 0.05)
#     Difference of proportion power calculation for binomial distribution (arcsine transformation) 
#
#              h = 0.01133831
#              n = 122106.8
#      sig.level = 0.05
#          power = 0.8
#    alternative = two.sided
#
#NOTE: same sample sizes

# Normal approximation-based
stats::power.prop.test(n = NULL, p1 = 0.05, p2 = 0.05*effect, power = 0.8, sig.level = 0.05) 
#     Two-sample comparison of proportions power calculation 
#
#              n = 122123.5
#             p1 = 0.05
#             p2 = 0.0525
#      sig.level = 0.05
#          power = 0.8
#    alternative = two.sided
#
#NOTE: n is number in *each* group

I think offering analytical methods in binary designs using the above approximations could be a valuable alternative to your simulation-based power calculations since the former are commonplace in statistics.

Yes, it's a good alternative for empiric evaluations. So we will try to insert this approach to the source code. Thanks for the issue.

Analytical methods for binary metrics added in #30
Now theoretical design supports asin and normal approximations.