Table of contents
- Cracking A/B Testing Problems in DS interview
- How to Estimate Sample Size in A/B Tests
- A Summary of Udacity A/B Testing Course
Cracking A/B Testing Problems in DS interview
- power = 1 - (Type 2 error)
- More samples if sample variance is larger.
- Less samples if difference between treatment and control is larger.
- This value is decided by multiple stakeholders.
- Obtain the number of days to run the experiment by dividing the sample size by the number of users in each group.
- If the number less than 14 days, we typically would run for 14 days to capture the weekly pattern.
Multiple Testing Problem
Novelty and Primacy effect(Change Aversion)
- If we already have a test running and we want to analyze if there is novelty effect we could compare first-time users vs. old user’s result in the treatment group to get an actual estimate of the impact of novelty effect. Same for primacy effect.
Interference between variants
- Interference between control and treatment groups can also lead to unreliable results.
- Typically we split control and treatment groups by randomly select users, and in the ideal scenario each users is independent and we expect no interference between control and treatment groups.
Dealing with interference
- surge price: 动态定价
- Long time: A referral program. It can take some time for users to refer his or her friend.
How to Estimate Sample Size in A/B Tests
- Type I error is a false positive conclusion.
- Type II error is a false negative conclusion.
- These two situation get same results.
A Summary of Udacity A/B Testing Course
Can we test everything?
- change aversion, novelty effect
- (1) what is the base of your comparison?
- (2) how much time you need in order for your users to adapt to the new experience, so that you can actually say what is the plateaued experience and make a robust decision?
how to do an A/B test?
- Choose and characterize metrics to evaluate your experiments, i.e. what do you care about, how do you want to measure the effect
- Choose significance level (alpha), statistical power (1-beta) and practical significance level you really want to launch the change if the test is statistically significant
- Calculate required sample size
- Take sample for control/treatment groups and run the test
- Analyze the results and draw valid conclusions
Step 1: Choose and characterize metrics for both sanity check and evaluation
- sensitivity and robustness
Step 2: Choose significance level, statistical power and practical significance level
- You may not want to launch a change even if the test is statistically significant because you need to consider the business impact of the change, whether it is worthwhile to launch considering the engineering cost, customer support or sales issue, and opportunity costs.
Step 3: Calculate required sample size
Step 4: Take sample for control/treatment groups and run the test