1. Binary Bayesian hypothesis testing
1.0 Problem Setting
- Hypothesis
- Hypothesis space H={H0,H1}
- Bayesian approach: Model the valid hypothesis as an RV H
- Prior P0=pH(H0),P1=pH(H1)=1−P0
- Observation
- Observation space Y
- Observation Model py∣H(⋅∣H0),py∣H(⋅∣H1)
- Decision rule f:Y→H
- Cost function C:H×H→R
- Let Cij=C(Hj,Hi),correcthypoisHj
- C is valid if Cjj<Cij
- Optimum decision rule H^(⋅)=argf(⋅)minE[C(H,f(y))]
1.1 Binary Bayesian hypothesis testing
Theorem: The optimal Bayes’ decision takes the form
L(y)≜py∣H(⋅∣H0)py∣H(⋅∣H1)⋛H1P1P0C01−C11C10−C00≜η
Proof:
KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \varphi(f) &=…
Given y∗
- if f(y∗)=H0, E=C00pH∣y(H0∣y∗)+C01pH∣y(H1∣y∗)
- if f(y∗)=H1, E=C10pH∣y(H0∣y∗)+C11pH∣y(H1∣y∗)
So
pH∣y(H0∣y∗)pH∣y(H1∣y∗)⋛H1C01−C11C10−C00
备注:证明过程中,注意贝叶斯检验为确定性检验,因此对于某个确定的 y,f(y)=H1 的概率要么为 0 要么为 1。因此对代价函数求期望时,把 H 看作是随机变量,而把 f(y) 看作是确定的值来分类讨论
Special cases
- Maximum a posteriori (MAP)
- C00=C11=0,C01=C10=1
- H^(y)==argH∈{H0,H1}maxpH∣y(H∣y)
- Maximum likelihood (ML)
- C00=C11=0,C01=C10=1,P0=P1=0.5
- H^(y)==argH∈{H0,H1}maxpy∣H(y∣H)
1.2 Likelyhood Ratio Test
Generally, LRT
L(y)≜py∣H(⋅∣H0)py∣H(⋅∣H1)⋛H1η
- Bayesian formulation gives a method of calculating η
- L(y) is a sufficient statistic for the decision problem
- L(y) 的可逆函数也是充分统计量
充分统计量
1.3 ROC
- Detection probability PD=P(H^=H1∣H=H1)
- False-alarm probability PF=P(H^=H1∣H=H0)
性质(重要!)
- LRT 的 ROC 曲线是单调不减的
2. Non-Bayesian hypo test
- Non-Bayesian 不需要先验概率或者代价函数
Neyman-Pearson criterion
H^(⋅)maxPD s.t.PF≤α
Theorem(Neyman-Pearson Lemma):NP 准则的最优解由 LRT 得到,其中 η 由以下公式得到
PF=P(L(y)≥η∣H=H0)=α
Proof:物理直观:同一个 PF 时 LRT 的 PD 最大。物理直观来看,LRT 中判决为 H1 的区域中 p(y∣H0)p(y∣H1) 都尽可能大,因此 PF 相同时 PD 可最大化
备注:NP 准则最优解为 LRT,原因是
- 同一个 PF 时, LRT 的 PD 最大
- LRT 取不同的 η 时,PF 越大,则 PD 也越大,即 ROC 曲线单调不减
3. Randomized test
3.1 Decision rule
-
Two deterministic decision rules H′^(⋅),H′′^(⋅)
-
Randomized decision rule H^(⋅) by time-sharing
H^(⋅)={H^′(⋅),H^′′(⋅), with probability p with probability 1−p- Detection prob PD=pPD′+(1−p)PD′′
- False-alarm prob PF=pPF′+(1−P)PF′′
-
A randomized decision rule is fully described by pH^∣y(Hm∣y) for m=0,1
3.2 Proposition
-
Bayesian case: cannot achieve a lower Bayes’ risk than the optimum LRT
Proof: Risk for each y is linear in pH∣y(H0∣y), so the minima is achieved at 0 or 1, which degenerate to deterministic decision
KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ \varphi(\mathb… -
Neyman-Pearson case:
- continuous-valued: For a given PF constraint, randomized test cannot achieve a larger PD than optimum LRT
- discrete-valued: For a given PF constraint, randomized test can achieve a larger PD than optimum LRT. Furthermore, the optimum rand test corresponds to simple time-sharing between the two LRTs nearby
3.3 Efficient frontier
Boundary of region of achievable (PD,PF) operation points
- continuous-valued: ROC of LRT
- discrete-valued: LRT points and the straight line segments
Facts
- PD≥PF
- efficient frontier is concave function
- dPFdPD=η
4. Minmax hypo testing
prior: unknown, cost fun: known
4.1 Decision rule
-
minmax approach
H^(⋅)=argf(⋅)minp∈[0,1]maxφ(f,p) -
optimal decision rule
H^(⋅)=H^p∗(⋅)p∗=argp∈[0,1]maxφ(H^p,p)要想证明上面的最优决策,首先引入 mismatch Bayes decision
H^q(y)={H1,H0,L(y)≥q1−qC01−C11C10−C00otherwise
代价函数如下,可得到 φ(H^q,p) 与概率 p 成线性关系
φ(H^q,p)=(1−p)[C00(1−PF(q))+C10PF(q)]+p[C01(1−PD(q))+C11PD(q)]
Lemma: Max-min inequality
xmaxyming(x,y)≤yminxmaxg(x,y)
Theorem:
f(⋅)minp∈[0,1]maxφ(f,p)=p∈[0,1]maxf(⋅)minφ(f,p)
Proof of Lemma: Let h(x)=minyg(x,y)
g(x)⟹xmaxg(x)⟹xmaxg(x)≤f(x,y),∀x∀y≤xmaxf(x,y),∀y≤yminxmaxf(x,y)
Proof of Thm: 先取 ∀p1,p2∈[0,1],可得到
φ(H^p1,p1)=fminφ(f,p1)≤pmaxfminφ(f,p)≤fminpmaxφ(f,p)≤pmaxφ(H^p2,p)
由于 p1,p2 任取时上式都成立,因此可以取 p1=p2=p∗=argmaxpφ(H^p,p)要想证明定理则只需证明 φ(H^p∗,p∗)=maxpφ(H^p∗,p)
由前面可知 φ(H^q,p) 与 p 成线性关系,因此要证明上式
- 若 p∗∈(0,1),只需 ∂p∂φ(H^q∗,p)∣∣∣∣for any p=0,等式自然成立
- 若 p∗=1,只需 ∂p∂φ(H^q∗,p)∣∣∣∣for any p>0,最优解就是 p=1;q∗=0 同理
根据下面的引理,可以得到最优决策就是 Bayes 决策 p∗=argmaxpφ(H^p,p),其中 p∗ 满足
0=∂p∂φ(H^p∗,p)=(C01−C00)−(C01−C11)PD(p∗)−(C10−C00)PF(p∗)
Lemma:
dpdφ(H^p,p)∣∣∣∣∣∣p=q=∂p∂φ(H^q,p)∣∣∣∣∣∣p=q=∂p∂φ(H^q,p)∣∣∣∣∣∣for any p