统计推断(八) Model Selection

1.Bayesian Approach

  • Consider a nested sequence of model classes
    P1P2P3 \mathcal{P}_{1} \subset \mathcal{P}_{2} \subset \mathcal{P}_{3} \subset \cdots P1​⊂P2​⊂P3​⊂⋯

  • ML decision rule:
    m^=argmaxm{maxpPmp(y)}=argmaxm{maxapyx,H(ya,Hm)} \hat{m}=\arg \max _{m}\left\{\max _{p \in \mathcal{P}_{m}} p(\boldsymbol{y})\right\}=\arg \max _{m}\left\{\max _{a} p_{y | x, H}\left(\boldsymbol{y} | a, H_{m}\right)\right\} m^=argmmax​{p∈Pm​max​p(y)}=argmmax​{amax​py∣x,H​(y∣a,Hm​)}

2. Laplace’s Method

  • 连续分布
    p×(x)=p0(x)Zp p_{\times}(x)=\frac{p_{0}(x)}{Z_{p}} p×​(x)=Zp​p0​(x)​

  • 用 taylor 级数近似似然函数
    lnp0(x)lnp(x^)+(xx^)ddxlnp0(x)x=x^+12(xx^)2d2dx2lnp0(x)x=x^p0(x)p0(x^)exp[12Jy=y(x^)(xx^)2] \ln p_{0}(x) \approx \ln p(\hat{x})+\left.(x-\hat{x}) \frac{\mathrm{d}}{\mathrm{d} x} \ln p_{0}(x)\right|_{x=\hat{x}}+\left.\frac{1}{2}(x-\hat{x})^{2} \frac{\mathrm{d}^{2}}{\mathrm{d} x^{2}} \ln p_{0}(x)\right|_{x=\hat{x}} \\ p_{0}(x) \approx p_{0}(\hat{x}) \exp \left[-\frac{1}{2} J_{\mathbf{y}=\boldsymbol{y}}(\hat{x})(x-\hat{x})^{2}\right] lnp0​(x)≈lnp(x^)+(x−x^)dxd​lnp0​(x)∣∣∣∣​x=x^​+21​(x−x^)2dx2d2​lnp0​(x)∣∣∣∣​x=x^​p0​(x)≈p0​(x^)exp[−21​Jy=y​(x^)(x−x^)2]

3. Bayes Information Criterion

  • MAP decision rule:
    m^=argmaxmpyH(yHm) \hat{m}=\arg \max _{m} p_{\mathbf{y} | \mathbf{H}}\left(\boldsymbol{y} | H_{m}\right) m^=argmmax​py∣H​(y∣Hm​)
    其中
    pyH(yHm)=pyx,H(yx,Hm)pxH(xHm)dx p_{\mathbf{y} | \mathbf{H}}\left(\boldsymbol{y} | H_{m}\right)=\int p_{\mathbf{y} | \mathbf{x}, \mathbf{H}}\left(\boldsymbol{y} | x, H_{m}\right) p_{\mathbf{x} | \mathbf{H}}\left(x | H_{m}\right) \mathrm{d} x py∣H​(y∣Hm​)=∫py∣x,H​(y∣x,Hm​)px∣H​(x∣Hm​)dx

    q0(x)=pyx,H(yx,Hm)pxH(xHm)pxy,H(xy,Hm) q_{0}(x)=p_{\mathbf{y} | \mathbf{x}, \mathbf{H}}\left(\boldsymbol{y} | x, H_{m}\right) p_{\mathbf{x} | \mathbf{H}}\left(x | H_{m}\right) \propto p_{\mathbf{x} | \mathbf{y}, \mathbf{H}}\left(x | \boldsymbol{y}, H_{m}\right) q0​(x)=py∣x,H​(y∣x,Hm​)px∣H​(x∣Hm​)∝px∣y,H​(x∣y,Hm​)
    可以有
    pyH(yH)=q0(x)dxpyx,H(yx^,H)pxH(x^H)2πJy1(x^) p_{\mathrm{y} | \mathrm{H}}(\boldsymbol{y} | H)=\int q_{0}(x) \mathrm{d} x \approx p_{\mathrm{y} | x, \mathrm{H}}(\boldsymbol{y} | \hat{x}, H) p_{\mathrm{x} | \mathrm{H}}(\hat{x} | H) \sqrt{2 \pi J_{\mathrm{y}}^{-1}(\hat{x})} py∣H​(y∣H)=∫q0​(x)dx≈py∣x,H​(y∣x^,H)px∣H​(x^∣H)2πJy−1​(x^)
    其中最后一项为 Occam’s razor factor
统计推断(八) Model Selection统计推断(八) Model Selection Bonennult 发布了37 篇原创文章 · 获赞 27 · 访问量 2万+ 私信 关注
上一篇:HM后台(五)


下一篇:LeetCode 825. Friends Of Appropriate Ages