WanjohiWanjohi / AB-Test-Early-Stopping

Evaluation of early stopping algorithms in A/B testing

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Evaluating Early Stopping Algorithm for A/B Testing

This repository contains simulation scripts to evaluate different statistical approaches for optional stopping of A/B testing. Please run with python3.

Frequentist

  • Wald's SPRT test based on testing a single point hypothesis against another point hypothesis, possible workaround:

Gary Lordon. 2-sprt’s and the modified kiefer-weiss problem of minimizing an expected sample size. The Annals of Statistics, 4:281–291, 1976.

  • Based on the z-score and a spending function that distributes the false positive rate over time
    • Pocock
    • O'Brien-Fleming (more conservative)
    • alpha spending function ftp://maia-2.biostat.wisc.edu/pub/chappell/641/papers/paper35.pdf

http://www.aarondefazio.com/tangentially/?p=83

Review: Sebille et al. Sequential methods and group sequential designs for comparative clinical trials. 2003

Bayesian

Open questions

  • While controlling the false positive rate, one should also check the power of alternative methods.
  • Compare with the outcome of the BayesFactor package.

Literature

  • blog post

Defazio HOW TO DO A/B TESTING WITH EARLY STOPPING CORRECTLY? 2016

  • SD density ratio

Wagenmakers et al. Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method. Cog Psych 2010

  • How to choose the prior; JZS prior (Cauchy on effect size, Jeffreys on variance); Analytical solution to BF

Rouder et al. Bayesian t tests for accepting and rejecting the null hypothesis. Psych Bulletin & Review 2009

  • overview of NHST/Bayesian parameter estimation/Bayes factor

Kruschke Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psych Sci 2011

Good reads

  • Make libraries in the virtualenv available in the jupyter notebook

http://help.pythonanywhere.com/pages/IPythonNotebookVirtualenvs

  • Bayesian A/B testing tutorial using Stan

https://rpubs.com/rasmusab/exercise_2_bayesian_ab_testing

http://m-clark.github.io/docs/IntroBayes.html

http://nbviewer.jupyter.org/github/QuantEcon/QuantEcon.notebooks/blob/master/IntroToStan_basics_workflow.ipynb

  • HMC

http://arogozhnikov.github.io/2016/12/19/markov_chain_monte_carlo.html

  • General overview of frequentist vs. Bayesian, null hypothesis testing vs. parameter estimation

http://www.valuewalk.com/wp-content/uploads/2015/05/SSRN-id2606016.pdf

  • Review of NHST, GS and BF

Schoenbrodt et al. Sequential Hypothesis Testing With Bayes Factors: Efficiently Testing Mean Differences

  • Running too many processes in parallel on slurm is problematic because unpickling/compiling of the Stan model file automatically fills up /tmp and thus halts the whole program. The current solution is to change the environment variable TMPDIR to some local directory which does not have a space limit, and limit the number of CPUs to 5 at the same time.

  • Statistical advice for A/B Testing

http://sl8r000.github.io/ab_testing_statistics/

  • Most winning A/B test results Are Illusory

http://www.qubit.com/sites/default/files/pdf/mostwinningabtestresultsareillusory_0.pdf

  • Warning signs in experimental design and interpretation

http://norvig.com/experiment-design.html

  • How to do A/B testing with early stopping correctly

http://www.aarondefazio.com/tangentially/?p=83

  • A/B tests at Airbnb

https://medium.com/airbnb-engineering/experiments-at-airbnb-e2db3abf39e7

  • Dance of Bayes factors

http://daniellakens.blogspot.de/2016/07/dance-of-bayes-factors.html

Run simulation

For example:

python3.6 simulate.py -c 4 -f bayes_factor -k normal_shifted -m /Users/shuang/PersWorkSpace/ABTestEarlyStopping/model/normal_kpi.stan

or python3.6 simulate.py -c 4 -f group_sequential -k normal_shifted

About

Evaluation of early stopping algorithms in A/B testing


Languages

Language:Python 88.6%Language:Stan 6.9%Language:Shell 4.4%