统计推断(一) Hypothesis Test

1. Binary Bayesian hypothesis testing

1.0 Problem Setting

  • Hypothesis
    • Hypothesis space H={H0,H1}\mathcal{H}=\{H_0, H_1\}H={H0​,H1​}
    • Bayesian approach: Model the valid hypothesis as an RV H
    • Prior P0=pH(H0),P1=pH(H1)=1P0P_0 = p_\mathsf{H}(H_0), P_1=p_\mathsf{H}(H_1)=1-P_0P0​=pH​(H0​),P1​=pH​(H1​)=1−P0​
  • Observation
    • Observation space Y\mathcal{Y}Y
    • Observation Model pyH(H0),pyH(H1)p_\mathsf{y|H}(\cdot|H_0), p_\mathsf{y|H}(\cdot|H_1)py∣H​(⋅∣H0​),py∣H​(⋅∣H1​)
  • Decision rule f:YHf:\mathcal{Y\to H}f:Y→H
  • Cost function C:H×HRC: \mathcal{H\times H} \to \mathbb{R}C:H×H→R
    • Let Cij=C(Hj,Hi),correcthypoisHjC_{ij}=C(H_j,H_i), correct hypo is H_jCij​=C(Hj​,Hi​),correcthypoisHj​
    • CCC is valid if Cjj<CijC_{jj}<C_{ij}Cjj​<Cij​
  • Optimum decision rule H^()=argminf()E[C(H,f(y))]\hat{H}(\cdot) = \arg\min\limits_{f(\cdot)}\mathbb{E}[C(\mathsf{H},f(\mathsf{y}))]H^(⋅)=argf(⋅)min​E[C(H,f(y))]

1.1 Binary Bayesian hypothesis testing

Theorem: The optimal Bayes’ decision takes the form
L(y)pyH(H1)pyH(H0)H1P0P1C10C00C01C11η L(\mathsf{y}) \triangleq \frac{p_\mathsf{y|H}(\cdot|H_1)}{p_\mathsf{y|H}(\cdot|H_0)} \overset{H_1} \gtreqless \frac{P_0}{P_1} \frac{C_{10}-C_{00}}{C_{01}-C_{11}} \triangleq \eta L(y)≜py∣H​(⋅∣H0​)py∣H​(⋅∣H1​)​⋛H1​​P1​P0​​C01​−C11​C10​−C00​​≜η
Proof:
KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \varphi(f) &=…
Given yy^*y∗

  • if f(y)=H0f(y^*)=H_0f(y∗)=H0​, E=C00pHy(H0y)+C01pHy(H1y)\mathbb{E}=C_{00}p_{\mathsf{H|y}}(H_0|y^*)+C_{01}p_{\mathsf{H|y}}(H_1|y^*)E=C00​pH∣y​(H0​∣y∗)+C01​pH∣y​(H1​∣y∗)
  • if f(y)=H1f(y^*)=H_1f(y∗)=H1​, E=C10pHy(H0y)+C11pHy(H1y)\mathbb{E}=C_{10}p_{\mathsf{H|y}}(H_0|y^*)+C_{11}p_{\mathsf{H|y}}(H_1|y^*)E=C10​pH∣y​(H0​∣y∗)+C11​pH∣y​(H1​∣y∗)

So
pHy(H1y)pHy(H0y)H1C10C00C01C11 \frac{p_\mathsf{H|y}(H_1|y^*)}{p_\mathsf{H|y}(H_0|y^*)} \overset{H_1} \gtreqless \frac{C_{10}-C_{00}}{C_{01}-C_{11}} pH∣y​(H0​∣y∗)pH∣y​(H1​∣y∗)​⋛H1​​C01​−C11​C10​−C00​​
备注:证明过程中,注意贝叶斯检验为确定性检验,因此对于某个确定的 y,f(y)=H1f(y)=H_1f(y)=H1​ 的概率要么为 0 要么为 1。因此对代价函数求期望时,把 H 看作是随机变量,而把 f(y)f(y)f(y) 看作是确定的值来分类讨论

Special cases

  • Maximum a posteriori (MAP)
    • C00=C11=0,C01=C10=1C_{00}=C_{11}=0,C_{01}=C_{10}=1C00​=C11​=0,C01​=C10​=1
    • H^(y)==argmaxH{H0,H1}pHy(Hy)\hat{H}(y)==\arg\max\limits_{H\in\{H_0,H_1\}} p_\mathsf{H|y}(H|y)H^(y)==argH∈{H0​,H1​}max​pH∣y​(H∣y)
  • Maximum likelihood (ML)
    • C00=C11=0,C01=C10=1,P0=P1=0.5C_{00}=C_{11}=0,C_{01}=C_{10}=1, P_0=P_1=0.5C00​=C11​=0,C01​=C10​=1,P0​=P1​=0.5
    • H^(y)==argmaxH{H0,H1}pyH(yH)\hat{H}(y)==\arg\max\limits_{H\in\{H_0,H_1\}} p_\mathsf{y|H}(y|H)H^(y)==argH∈{H0​,H1​}max​py∣H​(y∣H)

1.2 Likelyhood Ratio Test

Generally, LRT
L(y)pyH(H1)pyH(H0)H1η L(\mathsf{y}) \triangleq \frac{p_\mathsf{y|H}(\cdot|H_1)}{p_\mathsf{y|H}(\cdot|H_0)} \overset{H_1} \gtreqless \eta L(y)≜py∣H​(⋅∣H0​)py∣H​(⋅∣H1​)​⋛H1​​η

  • Bayesian formulation gives a method of calculating η\etaη
  • L(y)L(y)L(y) is a sufficient statistic for the decision problem
  • L(y)L(y)L(y) 的可逆函数也是充分统计量

充分统计量

1.3 ROC

  • Detection probability PD=P(H^=H1H=H1)P_D = P(\hat{H}=H_1 | \mathsf{H}=H_1)PD​=P(H^=H1​∣H=H1​)
  • False-alarm probability PF=P(H^=H1H=H0)P_F = P(\hat{H}=H_1 | \mathsf{H}=H_0)PF​=P(H^=H1​∣H=H0​)

性质(重要!)

  • LRT 的 ROC 曲线是单调不减的

统计推断(一) Hypothesis Test

2. Non-Bayesian hypo test

  • Non-Bayesian 不需要先验概率或者代价函数

Neyman-Pearson criterion

maxH^()PD   s.t.PFα \max_{\hat{H}(\cdot)}P_D \ \ \ s.t. P_F\le \alpha H^(⋅)max​PD​   s.t.PF​≤α

Theorem(Neyman-Pearson Lemma):NP 准则的最优解由 LRT 得到,其中 η\etaη 由以下公式得到
PF=P(L(y)ηH=H0)=α P_F=P(L(y)\ge\eta | \mathsf{H}=H_0) = \alpha PF​=P(L(y)≥η∣H=H0​)=α
Proof
统计推断(一) Hypothesis Test

物理直观:同一个 PFP_FPF​ 时 LRT 的 PDP_DPD​ 最大。物理直观来看,LRT 中判决为 H1 的区域中 p(yH1)p(yH0)\frac{p(y|H_1)}{p(y|H_0)}p(y∣H0​)p(y∣H1​)​ 都尽可能大,因此 PFP_FPF​ 相同时 PDP_DPD​ 可最大化

备注:NP 准则最优解为 LRT,原因是

  • 同一个 PFP_FPF​ 时, LRT 的 PDP_DPD​ 最大
  • LRT 取不同的 η\etaη 时,PFP_FPF​ 越大,则 PDP_DPD​ 也越大,即 ROC 曲线单调不减

3. Randomized test

3.1 Decision rule

  • Two deterministic decision rules H^(),H^()\hat{H'}(\cdot),\hat{H''}(\cdot)H′^(⋅),H′′^(⋅)

  • Randomized decision rule H^()\hat{H}(\cdot)H^(⋅) by time-sharing
    H^()={H^(), with probability pH^(), with probability 1p \hat{\mathrm{H}}(\cdot)=\left\{\begin{array}{ll}{\hat{H}^{\prime}(\cdot),} & {\text { with probability } p} \\ {\hat{H}^{\prime \prime}(\cdot),} & {\text { with probability } 1-p}\end{array}\right. H^(⋅)={H^′(⋅),H^′′(⋅),​ with probability p with probability 1−p​

    • Detection prob PD=pPD+(1p)PDP_D=pP_D'+(1-p)P_D''PD​=pPD′​+(1−p)PD′′​
    • False-alarm prob PF=pPF+(1P)PFP_F=pP_F'+(1-P)P_F''PF​=pPF′​+(1−P)PF′′​
  • A randomized decision rule is fully described by pH^y(Hmy)p_{\mathsf{\hat{H}|y}}(H_m|y)pH^∣y​(Hm​∣y) for m=0,1

3.2 Proposition

  1. Bayesian case: cannot achieve a lower Bayes’ risk than the optimum LRT

    Proof: Risk for each y is linear in pHy(H0y)p_{\mathrm{H} | \mathbf{y}}\left(H_{0} | \mathbf{y}\right)pH∣y​(H0​∣y), so the minima is achieved at 0 or 1, which degenerate to deterministic decision
    KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \varphi(\mathb…

  2. Neyman-Pearson case:

    1. continuous-valued: For a given PFP_FPF​ constraint, randomized test cannot achieve a larger PDP_DPD​ than optimum LRT
    2. discrete-valued: For a given PFP_FPF​ constraint, randomized test can achieve a larger PDP_DPD​ than optimum LRT. Furthermore, the optimum rand test corresponds to simple time-sharing between the two LRTs nearby

3.3 Efficient frontier

Boundary of region of achievable (PD,PF)(P_D,P_F)(PD​,PF​) operation points

  • continuous-valued: ROC of LRT
  • discrete-valued: LRT points and the straight line segments

Facts

  • PDPFP_D \ge P_FPD​≥PF​
  • efficient frontier is concave function
  • dPDdPF=η\frac{dP_D}{dP_F}=\etadPF​dPD​​=η

统计推断(一) Hypothesis Test

4. Minmax hypo testing

prior: unknown, cost fun: known

4.1 Decision rule

  • minmax approach
    H^()=argminf()maxp[0,1]φ(f,p) \hat H(\cdot)=\arg\min_{f(\cdot)}\max_{p\in[0,1]} \varphi(f,p) H^(⋅)=argf(⋅)min​p∈[0,1]max​φ(f,p)

  • optimal decision rule
    H^()=H^p()p=argmaxp[0,1]φ(H^p,p) \hat H(\cdot)=\hat{H}_{p_*}(\cdot) \\ p_* = \arg\max_{p\in[0,1]} \varphi(\hat H_p, p) H^(⋅)=H^p∗​​(⋅)p∗​=argp∈[0,1]max​φ(H^p​,p)

    要想证明上面的最优决策,首先引入 mismatch Bayes decision
    H^q(y)={H1,L(y)1qqC10C00C01C11H0,otherwise \hat{\mathrm{H}}_q(y)=\left\{ \begin{array}{ll}{H_1,} & {L(y) \ge \frac{1-q}{q}\frac{C_{10}-C_{00}}{C_{01}-C_{11}}} \\ {H_0,} & {otherwise}\end{array}\right. H^q​(y)={H1​,H0​,​L(y)≥q1−q​C01​−C11​C10​−C00​​otherwise​
    代价函数如下,可得到 φ(H^q,p)\varphi(\hat H_q,p)φ(H^q​,p) 与概率 ppp 成线性关系
    φ(H^q,p)=(1p)[C00(1PF(q))+C10PF(q)]+p[C01(1PD(q))+C11PD(q)] \varphi(\hat H_q,p)=(1-p)[C_{00}(1-P_F(q))+C_{10}P_F(q)] + p[C_{01}(1-P_D(q))+C_{11}P_D(q)] φ(H^q​,p)=(1−p)[C00​(1−PF​(q))+C10​PF​(q)]+p[C01​(1−PD​(q))+C11​PD​(q)]
    Lemma: Max-min inequality
    maxxminyg(x,y)minymaxxg(x,y) \max_x\min_y g(x,y) \le \min_y\max_x g(x,y) xmax​ymin​g(x,y)≤ymin​xmax​g(x,y)
    Theorem:
    minf()maxp[0,1]φ(f,p)=maxp[0,1]minf()φ(f,p) \min_{f(\cdot)}\max_{p\in[0,1]}\varphi(f,p)=\max_{p\in[0,1]}\min_{f(\cdot)}\varphi(f,p) f(⋅)min​p∈[0,1]max​φ(f,p)=p∈[0,1]max​f(⋅)min​φ(f,p)
    Proof of Lemma: Let h(x)=minyg(x,y)h(x)=\min_y g(x,y)h(x)=miny​g(x,y)
    g(x)f(x,y),xymaxxg(x)maxxf(x,y),ymaxxg(x)minymaxxf(x,y) \begin{aligned} g(x) &\leq f(x, y), \forall x \forall y \\ \Longrightarrow \max _{x} g(x) & \leq \max _{x} f(x, y), \forall y \\ \Longrightarrow \max _{x} g(x) & \leq \min _{y} \max _{x} f(x, y) \end{aligned} g(x)⟹xmax​g(x)⟹xmax​g(x)​≤f(x,y),∀x∀y≤xmax​f(x,y),∀y≤ymin​xmax​f(x,y)​
    Proof of Thm: 先取 p1,p2[0,1]\forall p_1,p_2 \in [0,1]∀p1​,p2​∈[0,1],可得到
    φ(H^p1,p1)=minfφ(f,p1)maxpminfφ(f,p)minfmaxpφ(f,p)maxpφ(H^p2,p) \varphi(\hat H_{p_1},p_1)=\min_f \varphi(f,p_1) \le \max_p \min_f \varphi(f,p) \le \min_f \max_p \varphi(f, p) \le \max_p \varphi(\hat H_{p_2}, p) φ(H^p1​​,p1​)=fmin​φ(f,p1​)≤pmax​fmin​φ(f,p)≤fmin​pmax​φ(f,p)≤pmax​φ(H^p2​​,p)
    由于 p1,p2p_1,p_2p1​,p2​ 任取时上式都成立,因此可以取 p1=p2=p=argmaxpφ(H^p,p)p_1=p_2=p_*=\arg\max_p \varphi(\hat H_p, p)p1​=p2​=p∗​=argmaxp​φ(H^p​,p)

    要想证明定理则只需证明 φ(H^p,p)=maxpφ(H^p,p)\varphi(\hat H_{p_*},p_*)=\max_p \varphi(\hat H_{p_*}, p)φ(H^p∗​​,p∗​)=maxp​φ(H^p∗​​,p)

    由前面可知 φ(H^q,p)\varphi(\hat H_q,p)φ(H^q​,p) 与 ppp 成线性关系,因此要证明上式

    • p(0,1)p_* \in (0,1)p∗​∈(0,1),只需 φ(H^q,p)pfor any p=0\left.\frac{\partial \varphi\left(\hat{H}_{q^{*}}, p\right)}{\partial p}\right|_{\text {for any } p}=0∂p∂φ(H^q∗​,p)​∣∣∣∣​for any p​=0,等式自然成立
    • p=1p_* = 1p∗​=1,只需 φ(H^q,p)pfor any p>0\left.\frac{\partial \varphi\left(\hat{H}_{q^{*}}, p\right)}{\partial p}\right|_{\text {for any } p} > 0∂p∂φ(H^q∗​,p)​∣∣∣∣​for any p​>0,最优解就是 p=1p=1p=1;q=0q_*=0q∗​=0 同理

    根据下面的引理,可以得到最优决策就是 Bayes 决策 p=argmaxpφ(H^p,p)p_*=\arg\max_p \varphi(\hat H_p, p)p∗​=argmaxp​φ(H^p​,p),其中 pp_*p∗​ 满足
    0=φ(H^p,p)p=(C01C00)(C01C11)PD(p)(C10C00)PF(p) \begin{aligned} 0 &=\frac{\partial \varphi\left(\hat{H}_{p_{*}}, p\right)}{\partial p} \\ &=\left(C_{01}-C_{00}\right)-\left(C_{01}-C_{11}\right) P_{\mathrm{D}}\left(p_{*}\right)-\left(C_{10}-C_{00}\right) P_{\mathrm{F}}\left(p_{*}\right) \end{aligned} 0​=∂p∂φ(H^p∗​​,p)​=(C01​−C00​)−(C01​−C11​)PD​(p∗​)−(C10​−C00​)PF​(p∗​)​
    Lemma:
    dφ(H^p,p)dpp=q=φ(H^q,p)pp=q=φ(H^q,p)pfor any p \left.\frac{\mathrm{d} \varphi\left(\hat{H}_{p}, p\right)}{\mathrm{d} p}\right|_{p=q}=\left.\frac{\partial \varphi\left(\hat{H}_{q}, p\right)}{\partial p}\right|_{p=q}=\left.\frac{\partial \varphi\left(\hat{H}_{q}, p\right)}{\partial p}\right|_{\text {for any } p} dpdφ(H^p​,p)​∣∣∣∣∣∣​p=q​=∂p∂φ(H^q​,p)​∣∣∣∣∣∣​p=q​=∂p∂φ(H^q​,p)​∣∣∣∣∣∣​for any p​
    统计推断(一) Hypothesis Test

统计推断(一) Hypothesis Test统计推断(一) Hypothesis Test Bonennult 发布了37 篇原创文章 · 获赞 27 · 访问量 2万+ 私信 关注
上一篇:CSP-S 2019数学知识总结 之 欧拉定理


下一篇:洛谷 [P1220] 关路灯