Skip to main content

Experimentation Platform

Accelerating software innovation through trustworthy experimentation

Home: ExP Platform
Talks
Journal Survey
Encyclopedia article
Objective Bayesian AB
Dilution
Rules of Thumb
Two Stage
Large Scale
Puzzling Outcomes
CUPED
Experiments at Microsoft
Practical Guide (short)
Tracking Users' Clicks an
Seven Pitfalls
Semmelweis Reflex
Bloodletting
Power Calculator
What is a HiPPO
Pitfalls of Long Term

Diluted Treatment Effect Estimation for Trigger Analysis in Online Controlled Experiments

 

by Alex Deng and Victor Hu

 

WSDM 2015, Feb 2015, Shanghai, China. Paper, Slides(PDF)

 

Abstract

Online controlled experiments, also called A/B testing, is playing a central role in many data-driven web-facing companies. It is well known and intuitively obvious to many practitioners that when testing a feature with low coverage, analyzing all data collected without zooming into the part that could be affected by the treatment often leads to under-powered hypothesis testing. A common practice is to use triggered analysis. To estimate the overall treatment effect, certain dilution formula is then applied to translate the estimated effect in triggered analysis back to the original all up population. In this paper, we discuss two different types of trigger analyses. We derive correct dilution formulas and show for a set of widely used metrics, namely ratio metrics, correctly deriving and applying those dilution formulas are not trivial. We observe many practitioners in this industry are often applying approximate formulas or even wrong formulas when doing effect dilution calculation. To deal with that, instead of estimating trigger treatment effect followed by effect translation using dilution formula, we aim at combining these two steps into one streamlined analysis, producing more accurate estimation of overall treatment effect together with even higher statistical power than a triggered analysis. The approach we propose in this paper is intuitive, easy to apply and general enough for all types of triggered analyses and all types of metrics.