Evaluating Early Stopping Algorithm for A/B Testing

This repository contains simulation scripts to evaluate different statistical approaches for optional stopping of A/B testing. Please run with python3.

Frequentist

Wald's SPRT test based on testing a single point hypothesis against another point hypothesis, possible workaround:

Gary Lordon. 2-sprt’s and the modified kiefer-weiss problem of minimizing an expected sample size. The Annals of Statistics, 4:281–291, 1976.

Based on the z-score and a spending function that distributes the false positive rate over time
- Pocock
- O'Brien-Fleming (more conservative)
- alpha spending function ftp://maia-2.biostat.wisc.edu/pub/chappell/641/papers/paper35.pdf

http://www.aarondefazio.com/tangentially/?p=83

Review: Sebille et al. Sequential methods and group sequential designs for comparative clinical trials. 2003

Bayesian

Open questions

While controlling the false positive rate, one should also check the power of alternative methods.
Compare with the outcome of the BayesFactor package.

Literature

blog post

Defazio HOW TO DO A/B TESTING WITH EARLY STOPPING CORRECTLY? 2016

SD density ratio

Wagenmakers et al. Bayesian hypothesis testing for psychologists: A tutorial on the Savage–Dickey method. Cog Psych 2010

How to choose the prior; JZS prior (Cauchy on effect size, Jeffreys on variance); Analytical solution to BF

Rouder et al. Bayesian t tests for accepting and rejecting the null hypothesis. Psych Bulletin & Review 2009

overview of NHST/Bayesian parameter estimation/Bayes factor

Kruschke Bayesian assessment of null values via parameter estimation and model comparison. Perspectives on Psych Sci 2011

Good reads

Make libraries in the virtualenv available in the jupyter notebook

http://help.pythonanywhere.com/pages/IPythonNotebookVirtualenvs

Bayesian A/B testing tutorial using Stan

https://rpubs.com/rasmusab/exercise_2_bayesian_ab_testing

http://m-clark.github.io/docs/IntroBayes.html

http://nbviewer.jupyter.org/github/QuantEcon/QuantEcon.notebooks/blob/master/IntroToStan_basics_workflow.ipynb

HMC

http://arogozhnikov.github.io/2016/12/19/markov_chain_monte_carlo.html

General overview of frequentist vs. Bayesian, null hypothesis testing vs. parameter estimation

http://www.valuewalk.com/wp-content/uploads/2015/05/SSRN-id2606016.pdf

Review of NHST, GS and BF

Schoenbrodt et al. Sequential Hypothesis Testing With Bayes Factors: Efficiently Testing Mean Differences

Running too many processes in parallel on slurm is problematic because unpickling/compiling of the Stan model file automatically fills up /tmp and thus halts the whole program. The current solution is to change the environment variable TMPDIR to some local directory which does not have a space limit, and limit the number of CPUs to 5 at the same time.
Statistical advice for A/B Testing

http://sl8r000.github.io/ab_testing_statistics/

Most winning A/B test results Are Illusory

http://www.qubit.com/sites/default/files/pdf/mostwinningabtestresultsareillusory_0.pdf

Warning signs in experimental design and interpretation

http://norvig.com/experiment-design.html

How to do A/B testing with early stopping correctly

http://www.aarondefazio.com/tangentially/?p=83

A/B tests at Airbnb

https://medium.com/airbnb-engineering/experiments-at-airbnb-e2db3abf39e7

Dance of Bayes factors

http://daniellakens.blogspot.de/2016/07/dance-of-bayes-factors.html

Run simulation

For example:

python3.6 simulate.py -c 4 -f bayes_factor -k normal_shifted -m /Users/shuang/PersWorkSpace/ABTestEarlyStopping/model/normal_kpi.stan

or python3.6 simulate.py -c 4 -f group_sequential -k normal_shifted

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
analysis		analysis
data		data
docs		docs
evaluation		evaluation
legacy		legacy
model		model
output		output
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
job.sh		job.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating Early Stopping Algorithm for A/B Testing

Frequentist

Bayesian

Open questions

Literature

Good reads

Run simulation

About

Releases

Packages

Languages

shansfolder/AB-Test-Early-Stopping

Folders and files

Latest commit

History

Repository files navigation

Evaluating Early Stopping Algorithm for A/B Testing

Frequentist

Bayesian

Open questions

Literature

Good reads

Run simulation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages