结论
假设 x 1 , ⋯ , x n x_1, \cdots, x_n x1,⋯,xn是来自 f θ ( x ) f_{\theta}(x) fθ(x)的独立同分布样本, θ ^ M L E \hat{\theta}_{MLE} θ^MLE是参数 θ \theta θ的极大似然估计,那么 θ ^ M L E ∼ ˙ N ( θ , 1 n I ( θ ) ) (1) \hat{\theta}_{MLE}\dot{\sim}N(\theta, \frac{1}{nI(\theta)})\tag{1} θ^MLE∼˙N(θ,nI(θ)1)(1)其中, I ( θ ) I(\theta) I(θ)为 F i s h e r Fisher Fisher信息量。
证明
首先来看单样本的情况,即有样本
x
x
x来自
f
θ
(
x
)
f_{\theta}(x)
fθ(x),则其似然函数为
l
x
(
θ
)
=
l
o
g
(
f
θ
(
x
)
)
(2)
l_x(\theta)=log (f_{\theta}(x))\tag{2}
lx(θ)=log(fθ(x))(2)对
θ
\theta
θ求导有
l
˙
x
(
θ
)
=
∂
∂
θ
l
o
g
(
f
θ
(
x
)
)
=
f
˙
θ
(
x
)
f
θ
(
x
)
(3)
\dot{l}_x(\theta)=\frac{\partial }{\partial \theta}log(f_{\theta}(x))=\frac{\dot{f}_\theta(x)}{f_\theta(x)}\tag{3}
l˙x(θ)=∂θ∂log(fθ(x))=fθ(x)f˙θ(x)(3)
l
˙
x
(
θ
)
\dot{l}_x(\theta)
l˙x(θ)被称作得分函数,它的期望为:
E
(
l
˙
x
(
θ
)
)
=
∫
χ
f
˙
θ
(
x
)
f
θ
(
x
)
f
θ
(
x
)
d
x
=
∫
χ
f
˙
θ
(
x
)
d
x
=
∫
χ
∂
∂
x
f
θ
(
x
)
d
x
=
∂
∂
x
∫
χ
f
θ
(
x
)
d
x
=
∂
∂
x
1
=
0
(4)
E(\dot{l}_x(\theta))=\int_{\chi}\frac{\dot{f}_\theta(x)}{f_\theta(x)}f_\theta(x)dx=\int_{\chi}\dot{f}_\theta(x)dx=\int_{\chi}\frac{\partial }{\partial x}f_\theta(x)dx=\frac{\partial }{\partial x}\int_{\chi}f_\theta(x)dx=\frac{\partial }{\partial x}1=0\tag{4}
E(l˙x(θ))=∫χfθ(x)f˙θ(x)fθ(x)dx=∫χf˙θ(x)dx=∫χ∂x∂fθ(x)dx=∂x∂∫χfθ(x)dx=∂x∂1=0(4)
I
(
θ
)
I(\theta)
I(θ)为
F
i
s
h
e
r
Fisher
Fisher信息量,被定义为得分函数
l
˙
x
(
θ
)
\dot{l}_x(\theta)
l˙x(θ)的方差:
I
(
θ
)
=
E
{
l
˙
x
(
θ
)
−
E
(
l
˙
x
(
θ
)
)
}
2
(5)
I(\theta)=E\{\dot{l}_x(\theta)-E(\dot{l}_x(\theta))\}^2\tag{5}
I(θ)=E{l˙x(θ)−E(l˙x(θ))}2(5)
而由
E
(
l
˙
x
(
θ
)
)
=
0
E(\dot{l}_x(\theta))=0
E(l˙x(θ))=0可知:
I
(
θ
)
=
E
{
l
˙
x
(
θ
)
}
2
=
E
{
f
˙
θ
(
x
)
f
θ
(
x
)
}
2
(6)
I(\theta)=E\{\dot{l}_x(\theta)\}^2=E\{\frac{\dot{f}_{\theta}(x)}{{f}_{\theta}(x)}\}^2\tag{6}
I(θ)=E{l˙x(θ)}2=E{fθ(x)f˙θ(x)}2(6)
因此可记
l
˙
x
(
θ
)
\dot{l}_x(\theta)
l˙x(θ)为:
l
˙
x
(
θ
)
∼
(
0
,
I
(
θ
)
)
(7)
\dot{l}_x(\theta)\sim(0, I(\theta))\tag{7}
l˙x(θ)∼(0,I(θ))(7)
接下来考虑得分函数的二阶导数
l
¨
x
(
θ
)
\ddot{l}_x(\theta)
l¨x(θ),即对式子
(
3
)
(3)
(3)等号两边同时对
θ
\theta
θ求导:
l
¨
x
(
θ
)
=
∂
∂
θ
(
f
˙
θ
(
x
)
f
θ
(
x
)
)
=
f
¨
θ
(
x
)
f
θ
(
x
)
−
(
f
˙
θ
(
x
)
f
θ
(
x
)
)
2
(8)
\ddot{l}_x(\theta)=\frac{\partial}{\partial \theta}(\frac{\dot{f}_\theta(x)}{f_\theta(x)})=\frac{\ddot{f}_{\theta}(x)}{{f}_{\theta}(x)}-(\frac{\dot{f}_{\theta}(x)}{{f}_{\theta}(x)})^2\tag{8}
l¨x(θ)=∂θ∂(fθ(x)f˙θ(x))=fθ(x)f¨θ(x)−(fθ(x)f˙θ(x))2(8)
因此得分函数的二阶导数
l
¨
x
(
θ
)
\ddot{l}_x(\theta)
l¨x(θ)的期望为:
E
{
l
¨
x
(
θ
)
}
=
0
−
E
{
f
˙
θ
(
x
)
f
θ
(
x
)
}
2
=
−
I
(
θ
)
(9)
E\{\ddot{l}_x(\theta)\}=0-E\{\frac{\dot{f}_{\theta}(x)}{{f}_{\theta}(x)}\}^2=-I(\theta)\tag{9}
E{l¨x(θ)}=0−E{fθ(x)f˙θ(x)}2=−I(θ)(9)
同样可记
l
¨
x
(
θ
)
\ddot{l}_x(\theta)
l¨x(θ)为:
−
l
¨
x
(
θ
)
∼
(
I
(
θ
)
,
J
(
θ
)
)
(10)
-\ddot{l}_x(\theta)\sim(I(\theta), J(\theta))\tag{10}
−l¨x(θ)∼(I(θ),J(θ))(10)
其中,
J
(
θ
)
J(\theta)
J(θ)为
l
¨
x
(
θ
)
\ddot{l}_x(\theta)
l¨x(θ)的方差,我们这里不进行考虑。
接下来讨论
n
n
n个样本的情况,即
x
1
,
⋯
,
x
n
x_1, \cdots, x_n
x1,⋯,xn是来自
f
θ
(
x
)
f_{\theta}(x)
fθ(x)的独立同分布样本,那么此时的联合密度函数为:
f
θ
(
X
)
=
∏
i
=
1
n
f
θ
(
x
i
)
f_{\theta}(X)=\prod\limits_{i=1}^nf_{\theta}(x_i)
fθ(X)=i=1∏nfθ(xi),同样的,总的得分函数为:
l
˙
X
(
θ
)
=
∑
i
=
1
n
l
˙
x
i
(
θ
)
(11)
\dot{l}_X({\theta})=\sum\limits_{i=1}^n\dot{l}_{x_i}(\theta)\tag{11}
l˙X(θ)=i=1∑nl˙xi(θ)(11)
根据
(
7
)
(7)
(7),每个
l
˙
x
i
(
θ
)
∼
(
0
,
I
(
θ
)
)
\dot{l}_{x_i}(\theta)\sim(0, I(\theta))
l˙xi(θ)∼(0,I(θ)),结合样本之间是独立的,可知:
l
˙
X
(
θ
)
∼
(
0
,
n
I
(
θ
)
)
(12)
\dot{l}_X({\theta})\sim(0, nI(\theta))\tag{12}
l˙X(θ)∼(0,nI(θ))(12)
类似的,有:
−
l
¨
X
(
θ
)
=
∑
i
=
1
n
(
−
l
¨
x
i
(
θ
)
)
(13)
-\ddot{l}_X({\theta})=\sum\limits_{i=1}^n(-\ddot{l}_{x_i}(\theta))\tag{13}
−l¨X(θ)=i=1∑n(−l¨xi(θ))(13)
同样的,根据
(
10
)
(10)
(10),每个
−
l
¨
x
i
(
θ
)
∼
(
I
(
θ
)
,
J
(
θ
)
)
-\ddot{l}_{x_i}(\theta)\sim(I(\theta), J(\theta))
−l¨xi(θ)∼(I(θ),J(θ)),因此有:
−
l
¨
X
(
θ
)
∼
(
n
I
(
θ
)
,
n
J
(
θ
)
)
(14)
-\ddot{l}_X({\theta})\sim(nI(\theta), nJ(\theta))\tag{14}
−l¨X(θ)∼(nI(θ),nJ(θ))(14)
根据定义,基于样本
x
1
,
⋯
,
x
n
x_1, \cdots, x_n
x1,⋯,xn,参数
θ
\theta
θ的极大似然估计
θ
^
M
L
E
\hat{\theta}_{MLE}
θ^MLE满足最大化条件
l
˙
X
(
θ
^
)
=
0
\dot{l}_X{(\hat{\theta})}=0
l˙X(θ^)=0,对其在
θ
\theta
θ处一阶泰勒展开有:
0
=
l
˙
X
(
θ
^
)
≈
l
˙
X
(
θ
)
+
l
¨
X
(
θ
)
(
θ
^
−
θ
)
(15)
0=\dot{l}_X{(\hat{\theta})}\approx\dot{l}_X{(\theta)}+\ddot{l}_X{(\theta)}(\hat{\theta}-\theta)\tag{15}
0=l˙X(θ^)≈l˙X(θ)+l¨X(θ)(θ^−θ)(15)
对其变形,有:
θ
^
≈
θ
−
l
˙
X
(
θ
)
l
¨
X
(
θ
)
=
θ
+
l
˙
X
(
θ
)
n
−
l
¨
X
(
θ
)
n
(16)
\hat{\theta}\approx\theta-\frac{\dot{l}_X(\theta)}{\ddot{l}_X(\theta)}=\theta+\frac{\frac{\dot{l}_X(\theta)}{n}}{-\frac{\ddot{l}_X(\theta)}{n}}\tag{16}
θ^≈θ−l¨X(θ)l˙X(θ)=θ+−nl¨X(θ)nl˙X(θ)(16)
式
(
12
)
(12)
(12)和中心极限定理表明:
l
˙
X
(
θ
)
n
∼
˙
(
0
,
I
(
θ
)
n
)
(17)
\frac{\dot{l}_X(\theta)}{n}\dot{\sim}(0, \frac{I(\theta)}{n})\tag{17}
nl˙X(θ)∼˙(0,nI(θ))(17)
式
(
14
)
(14)
(14)和大数定律表明:
−
l
¨
X
(
θ
)
n
趋
于
常
量
I
(
θ
)
(18)
-\frac{\ddot{l}_X(\theta)}{n}趋于常量I(\theta)\tag{18}
−nl¨X(θ)趋于常量I(θ)(18)
综合式
(
16
)
,
(
17
)
,
(
18
)
(16), (17), (18)
(16),(17),(18),即可得到
θ
^
∼
˙
N
(
θ
,
1
n
I
(
θ
)
)
(19)
\hat{\theta}\dot{\sim}N(\theta, \frac{1}{nI(\theta)})\tag{19}
θ^∼˙N(θ,nI(θ)1)(19)此即
(
1
)
(1)
(1)式,证毕。