AB Testing Review

Table of contents

Cracking A/B Testing Problems in DS interview

AB Testing Review

AB Testing Review

AB Testing Review

  • power = 1 - (Type 2 error)

AB Testing Review

  • More samples if sample variance is larger.
  • Less samples if difference between treatment and control is larger.

AB Testing Review

AB Testing Review

  • This value is decided by multiple stakeholders.

AB Testing Review

  • Obtain the number of days to run the experiment by dividing the sample size by the number of users in each group.
  • If the number less than 14 days, we typically would run for 14 days to capture the weekly pattern.

Multiple Testing Problem

AB Testing Review

AB Testing Review

AB Testing Review
AB Testing Review

AB Testing Review

Novelty and Primacy effect(Change Aversion)

AB Testing Review

AB Testing Review

AB Testing Review

  • If we already have a test running and we want to analyze if there is novelty effect we could compare first-time users vs. old user’s result in the treatment group to get an actual estimate of the impact of novelty effect. Same for primacy effect.

Interference between variants

  • Interference between control and treatment groups can also lead to unreliable results.

AB Testing Review

  • Typically we split control and treatment groups by randomly select users, and in the ideal scenario each users is independent and we expect no interference between control and treatment groups.

AB Testing Review
AB Testing Review
AB Testing Review

AB Testing Review

AB Testing Review

Dealing with interference

AB Testing Review

AB Testing Review

AB Testing Review

  • surge price: 动态定价
  • Long time: A referral program. It can take some time for users to refer his or her friend.

AB Testing Review

AB Testing Review

AB Testing Review

How to Estimate Sample Size in A/B Tests

AB Testing Review
AB Testing Review

AB Testing Review

  • Type I error is a false positive conclusion.
  • Type II error is a false negative conclusion.

AB Testing Review

  • These two situation get same results.

AB Testing Review

AB Testing Review

A Summary of Udacity A/B Testing Course


Can we test everything?

  • change aversion, novelty effect
  • (1) what is the base of your comparison?
  • (2) how much time you need in order for your users to adapt to the new experience, so that you can actually say what is the plateaued experience and make a robust decision?

how to do an A/B test?

  1. Choose and characterize metrics to evaluate your experiments, i.e. what do you care about, how do you want to measure the effect
  2. Choose significance level (alpha), statistical power (1-beta) and practical significance level you really want to launch the change if the test is statistically significant
  3. Calculate required sample size
  4. Take sample for control/treatment groups and run the test
  5. Analyze the results and draw valid conclusions

Step 1: Choose and characterize metrics for both sanity check and evaluation

  • sensitivity and robustness

Step 2: Choose significance level, statistical power and practical significance level

  • You may not want to launch a change even if the test is statistically significant because you need to consider the business impact of the change, whether it is worthwhile to launch considering the engineering cost, customer support or sales issue, and opportunity costs.

Step 3: Calculate required sample size

Step 4: Take sample for control/treatment groups and run the test

上一篇:Schema initialization FAILED! Metastore state would be inconsistent


下一篇:Hive tez引擎安装