AB test 检验significance,使用bootstrap
bootstrap是通过随机排序,“模拟”出一个“总体”。但也要求随机排序的次数要比就大(例如10000次),操作起来较为麻烦
且t-test只适用于 平均数,如果你想求population的median,或者其他统计量,中心极限定理也不能用。由于以上两个问题,我们想到了bootstrapping方法。总体 100 个人,求median。假设样本有5个: 12 34 45 78 99,求median,45现在进行有放回抽样: 12 → 把12放回去→99 → 把99放回去→45 → 把45放回去→12 → 把12放回去→34 → 把34放回去第一个bootstrap sample已经出来了:12 99 45 12 34, 求median, 34这样操作10000遍。median会形成一个distribution。
#Creating an list with bootstrapped means for each AB-group
boot_1d = []
for i in range(500):
boot_mean = df.sample(frac=1,replace=True).groupby('version')['retention_1'].mean()
boot_1d.append(boot_mean)
#Transforming the list to a DataFrame
boot_1d = pd.DataFrame(boot_1d)
#A Kernel Density Estimate plot of the bootstrap distributions
#... YOUR CODE FOR TASK 6 ...
boot_1d.plot.kde()
# Creating a list with bootstrapped means for each AB-group
boot_7d = []
for i in range(500):
boot_mean = df.sample(frac=1,replace=True).groupby('version')['retention_7'].mean()
boot_7d.append(boot_mean)
# Transforming the list to a DataFrame
boot_7d = pd.DataFrame(boot_7d)
# Adding a column with the % difference between the two AB-groups
boot_7d['diff'] = ((boot_7d['gate_30'] - boot_7d['gate_40'])/boot_7d['gate_40'])*100
# Ploting the bootstrap % difference
ax = boot_7d['diff'].plot(figsize=(16,8),kind='kde')
ax.set_xlabel("% difference in means")
# Calculating the probability that 7-day retention is greater when the gate is at level 30
prob = (boot_7d['diff']> 0).sum()/len(boot_7d)
# Pretty printing the probability
# ... YOUR CODE FOR TASK 10 ...
'{percent:.2%}'.format(percent=prob)
最后通过retention_7的difference比较,gate_30比gate_40 retention高
为了解释这个phenomenon
In short, hedonic adaptation is the tendency for people to get less and less enjoyment out of a fun activity over time if that activity is undertaken continuously.