我正在写一本名为Bayesian Analysis in Python的书.这本书主要关注PyMC3包,但它背后的理论有点模糊,所以我很困惑.
说我有这样的数据:
data = np.array([51.06, 55.12, 53.73, 50.24, 52.05, 56.40, 48.45, 52.34, 55.65, 51.49, 51.86, 63.43, 53.00, 56.09, 51.93, 52.31, 52.33, 57.48, 57.44, 55.14, 53.93, 54.62, 56.09, 68.58, 51.36, 55.47, 50.73, 51.94, 54.95, 50.39, 52.91, 51.5, 52.68, 47.72, 49.73, 51.82, 54.99, 52.84, 53.19, 54.52, 51.46, 53.73, 51.61, 49.81, 52.42, 54.3, 53.84, 53.16])
我正在看这样的模型:
使用Metropolis Sampling,
我怎样才能拟合估算mu和sigma的模型.
以下是我对我所读过的伪代码的猜测:
M, S = 50, 1
G = 1
# These are priors right?
mu = stats.norm(loc=M, scale=S)
sigma = stats.halfnorm(scale=G)
target = stats.norm
steps = 1000
mu_samples = [50]
sigma_samples = [1]
for i in range(steps):
# proposed sample...
mu_i, sigma_i = mu.rvs(), sigma.rvs()
# Something happens here
# How do I calculate the likelidhood??
"..."
# some evaluation of a likelihood ratio??
a = "some"/"ratio"
acceptance_bar = np.random.random()
if a > acceptance_bar:
mu_samples.append(mu_i)
sigma_samples.append(sigma_i)
我错过了什么?
解决方法:
我希望以下示例可以帮助您.在这个例子中,我将假设我们知道sigma的值,所以我们只有mu的先验值.
sigma = data.std() # we are assuming we know sigma
steps = 1000
mu_old = data.mean() # initial value, just a good guest
mu_samples = []
# we evaluate the prior for the initial point
prior_old = stats.norm(M, S).pdf(mu_old)
# we evaluate the likelihood for the initial point
likelihood_old = np.prod(stats.norm(mu_old, sigma).pdf(data))
# Bayes' theorem (omitting the denominator) for the initial point
post_old = prior_old * likelihood_old
for i in range(steps):
# proposal distribution, propose a new value from the old one
mu_new = stats.norm.rvs(mu_old, 0.1)
# we evaluate the prior
prior_new = stats.norm(M, S).pdf(mu_new)
# we evaluate the likelihood
likelihood_new = np.prod(stats.norm(mu_new, sigma).pdf(data))
# Bayes' theorem (omitting the denominator)
post_new = prior_new * likelihood_new
# the ratio of posteriors (we do not need to know the normalizing constant)
a = post_new / post_old
if np.random.random() < a:
mu_old = mu_new
post_old = post_new
mu_samples.append(mu_old)
笔记:
>请注意,我已经定义了一个提议分布,在这种情况下是一个以mu_old为中心的高斯分布,标准偏差为0.1(任意值).在实践中,MH的效率在很大程度上取决于提议分布,因此PyMC3(以及MH的其他实际实现)使用一些启发式来调整提议分布.
>为简单起见,我在本例中使用了pdf,但实际上使用logpdf很方便.这可以在不改变结果的情况下避免下溢问题.
>可能性是作为产品计算的
>你缺少的比例是后代的比例
>如果您不接受新建议的值,则保存(再次)旧值.
请记得检查this repository以获取勘误表和更新版本的代码.与本书代码相关的更新代码的主要区别在于,现在使用PyMC3运行模型的首选方法是使用pm.sample()并让PyMC3为您选择采样器和初始化点.