PRML - 1.68公式

2023-09-26 23:08:28

老外的一些解释

https://stats.stackexchange.com/questions/305078/how-to-compute-equation-1-68-of-bishops-book

I was treating the problem as having four random variables $x,t,D,w$ where $D=(X,T)$ then I only obtain this:

The book sneakily invoked the concept of "conditional independence".
本书偷偷摸摸的提到了"条件独立的概念"

Suppose we have variables $A,$ $B,$ and $C,$ and that $A$ and $B$ are conditionally independent given $C.$ This means that $P(A \mid B, C) = P(A \mid C).$ That is, if $C$ is observed, then $A$ is independent of $B.$ However, that independence is conditional, so it's still true that $P(A \mid B) \ne P(A)$ in general.

$假设我们有A,B,C三个变量，并且A,B在给定C的条件下是独立的,这就意味着P(A \mid B, C) = P(A \mid C),那么如果C被观测到,A,B就是独立的(\color{red}{貌似是概率图模型的概念}),然而这种独立性是带条件的,所以并不意味着P(A \mid B) \ne P(A)成立$

In this case, $t$ is conditionally independent of $D$ given $w.$ The reason for this is that $t$ solely depends on $w$ and $x,$ but if you don't know $w$ then $D$ gives you a hint to the value of $w.$ However, if you do know $w$ then $D$ is no longer useful for determining the value of $t.$ This explains why $D$ was omitted from $P(t \mid x, w, D)$ but not from $P(t \mid x, D).$

$基于上述情况,t在w给定的情况下是和D独立的,理由是t仅仅依赖于w和x,但是如果你不知道w，那么D会给w一点小提示。然而如果你知道w，那么推导t就不再需要D了，这就解释了D为什么能够从P(t \mid x, w, D)中被忽略掉，变成P(t \mid x, D)$

Similarly, $w$ is entirely independent of $x$ so $P(w \mid x, D) = P(w \mid D).$

国人摘抄的一些内容

https://nbviewer.org/github/hschen0712/machine-learning-notes/blob/master/PRML/Chap1-Introduction/1.2-probability-theory.ipynb

贝叶斯曲线拟合

前面介绍的MLE和MAP都属于点估计，这一节将介绍一种更完全的贝叶斯方法。回顾曲线拟合的目标，我们希望为给定的输入$\hat{x}$预测其对应的输出$\hat{t}$。这里假设参数$\alpha$和$\beta$已知，于是可以省略$\mathbf{w}$的后验概率中的参数，写为$p(\mathbf{w}|\mathbf{x},\mathbf{t})$。通过对下式右端关于$\mathbf{w}$积分，我们可以得到$t$的后验预测分布（posterior predictive distribution）： $$ p(t|x,\mathbf{x},\mathbf{t})=\int p(t|x,\mathbf{w})p(\mathbf{w}|\mathbf{x},\mathbf{t})d\mathbf{w}$$ 这个公式是我读这本书遇到的第一道坎，貌似很多人也在这个公式上卡了很久。我说一下我对这个公式的理解：
第一种理解：我们知道在贝叶斯中数据是已知的，只有参数$\mathbf{w}$是不确定的，因此式中$x,\mathbf{x},\mathbf{t}$都是确定的，为了直观我们可以把已知的都省略，于是原式变为 $$p(t)=\int p(t|\mathbf{w})p(\mathbf{w}) d\mathbf{w}=\int p(t,\mathbf{w})d\mathbf{w}$$ 这就很好理解了，就是对$\mathbf{w}$做marginalization（运用概率论的乘法公式和加法公式，连续的情况下求和变为积分）。
第二种理解：概率图模型，需要用到D-separation理论（D-Separation是一种用来判断变量是否条件独立的图形化方法）。以下举个D-separation最简单的例子，更多的理论知识请参考PRML第8章

我们要确定上图中$a$和$b$的关系，则可以分为两种情况来讨论
首先依据链式法则,我们写出该图模型的联合概率 $$p(a,b,c)=p(c)p(a|c)p(b|c)$$ 1）如果随机变量$c$已经被观测，则$a$与$b$条件独立，即$p(a,b|c)=p(a|c)p(b|c)$
证明过程如下： $$p(a,b|c)=\frac{p(a,b,c)}{p(c)}=\frac{p(c)p(a|c)p(b|c)}{p(c)}=p(a|c)p(b|c)$$ 同理，我们还能证明$p(b|a, c)=p(b|c)$ $$p(b|a, c)=\frac{p(a,b,c)}{p(a, c)}=\frac{p(c)p(a|c)p(b|c)}{p(c)p(a|c)}=p(b|c)$$ 2）如果随机变量$c$未被观测，通过对$p(a,b,c)$关于$c$积分我们获得$a$和$b$的联合概率 $$p(a,b)=\sum_{c}=p(c)p(a|c)p(b|c)$$ 通常情况下，$p(a,b)$是不等于$p(a)p(b)$的，因此$a$和$b$相互不独立
接下来我们讨论回归模型的概率图模型：

接下来我们来证明原式成立： $$\begin{aligned}p(t|x,\mathbf{x},\mathbf{t})&=\frac{p(t,x,\mathbf{x},\mathbf{t})}{p(x,\mathbf{x},\mathbf{t})}\&=\int \frac{p(t,x,\mathbf{x},\mathbf{t}, \mathbf{w})}{p(x,\mathbf{x},\mathbf{t})}d\mathbf{w}\&=\int \frac{p(t,x,\mathbf{x},\mathbf{t}, \mathbf{w})}{p(x,\mathbf{x},\mathbf{t}, \mathbf{w})}\frac{p(x,\mathbf{x},\mathbf{t}, \mathbf{w})}{p(x,\mathbf{x},\mathbf{t})}d\mathbf{w}\&=\int p(t|x,\mathbf{x},\mathbf{t}, \mathbf{w})p(\mathbf{w}|x,\mathbf{x},\mathbf{t})d\mathbf{w}\end{aligned}$$ 根据图模型的D-separation理论，$\mathbf{w}$被观测的条件下，上图中$\mathbf{x}$到$t$（在图中是$\hat{t}$）的通路被阻断，因此$t$与$\mathbf{x}$及$\mathbf{t}$相互独立，则 $$p(t|x,\mathbf{x},\mathbf{t}, \mathbf{w})=p(t|x,\mathbf{w})$$ 接着我们考察概率$p(\mathbf{w}|x,\mathbf{x},\mathbf{t})$，由于$t$尚未被观测，根据图模型D-separation理论，$\mathbf{w}$和$x$应该是独立的，此外由于$\mathbf{t}$已经被观测，那么$\mathbf{w}$与$\mathbf{x}$条件不独立。于是 $$p(\mathbf{w}|x,\mathbf{x},\mathbf{t})=p(\mathbf{w}|\mathbf{x},\mathbf{t})$$ 综上，我们知道 $$p(t|x,\mathbf{x},\mathbf{t})=\int p(t|x,\mathbf{w})p(\mathbf{w}|\mathbf{x},\mathbf{t})d\mathbf{w}$$

码农公寓

老外的一些解释

国人摘抄的一些内容

贝叶斯曲线拟合

相关文章