Q1: Let X 1 , … , X n X_1,\dots,X_n X1,…,Xn be an iid sample of Possion distribution with parameter λ > 0 \lambda>0 λ>0. Find an approximate 100 ( 1 − α ) % 100(1-\alpha)\% 100(1−α)% confidence interval for λ \lambda λ.
解: 已知泊松分布的均值和方差均为
λ
\lambda
λ,
X
1
,
X
2
,
⋯
,
X
n
X_1,X_2,\cdots,X_n
X1,X2,⋯,Xn是一个样本,因为样本容量
n
n
n较大,由中心极限定理,知
∑
i
=
1
n
X
i
−
n
λ
n
λ
=
n
X
ˉ
−
n
λ
n
λ
\frac{\displaystyle\sum_{i=1}^nX_i-n\lambda}{\sqrt{n\lambda}}=\frac{n\bar{X}-n\lambda}{\sqrt{n\lambda}}
nλ
i=1∑nXi−nλ=nλ
nXˉ−nλ 近似地服从
N
(
0
,
1
)
N(0,1)
N(0,1)分布,于是有
P
{
−
z
1
−
α
/
2
<
n
X
ˉ
−
n
λ
n
λ
<
z
1
−
α
/
2
}
=
1
−
α
P\{-z_{1-\alpha/2}<\frac{n\bar{X}-n\lambda}{\sqrt{n\lambda}}<z_{1-\alpha/2}\}=1-\alpha
P{−z1−α/2<nλ
nXˉ−nλ<z1−α/2}=1−α 而不等式
−
z
1
−
α
/
2
<
n
X
ˉ
−
n
λ
n
λ
<
z
1
−
α
/
2
-z_{1-\alpha/2}<\frac{n\bar{X}-n\lambda}{\sqrt{n\lambda}}<z_{1-\alpha/2}
−z1−α/2<nλ
nXˉ−nλ<z1−α/2 等价于
n
λ
2
−
(
2
n
X
ˉ
+
z
1
−
α
/
2
2
)
λ
+
n
X
ˉ
2
<
0
n\lambda^2-(2n\bar{X}+z_{1-\alpha/2}^2)\lambda+n\bar{X}^2<0
nλ2−(2nXˉ+z1−α/22)λ+nXˉ2<0 记
p
1
=
1
2
a
(
−
b
−
b
2
−
4
a
c
)
p
2
=
1
2
a
(
−
b
+
b
2
−
4
a
c
)
p_1=\frac{1}{2a}(-b-\sqrt{b^2-4ac})\\ p_2=\frac{1}{2a}(-b+\sqrt{b^2-4ac})
p1=2a1(−b−b2−4ac
)p2=2a1(−b+b2−4ac
) 此处
a
=
n
,
b
=
−
(
2
n
X
ˉ
+
z
1
−
α
/
2
2
)
,
c
=
n
λ
2
a=n,b=-(2n\bar{X}+z_{1-\alpha/2}^2),c=n\lambda^2
a=n,b=−(2nXˉ+z1−α/22),c=nλ2,
于是综上可得,
λ
\lambda
λ的一个置信水平为
1
−
α
1-\alpha
1−α的置信区间为
(
p
1
,
p
2
)
(p_1,p_2)
(p1,p2)
Q2: Suppose that an event A A A was observed 36 times out of 120 independent experiments. Use CLT to find an approximate 95 % 95\% 95% confidence interval for P ( A ) P(A) P(A).
解: 已知题目中的分布为二项分布,则分布律为
f
(
x
;
p
)
=
p
x
(
1
−
p
)
1
−
x
,
x
=
0
,
1
f(x;p)=p^x(1-p)^{1-x}\ ,\ x=0,1
f(x;p)=px(1−p)1−x , x=0,1 其中
p
=
P
(
A
)
p=P(A)
p=P(A),已知二项分布的均值和方差分别为
μ
=
p
,
σ
2
=
p
(
1
−
p
)
\mu=p\ ,\ \sigma^2=p(1-p)
μ=p , σ2=p(1−p) 设
X
1
,
X
2
,
⋯
,
X
n
X_1,X_2,\cdots,X_n
X1,X2,⋯,Xn是一个样本,由中心极限定理,知
∑
i
=
1
n
X
i
−
n
p
n
p
(
1
−
p
)
=
n
X
ˉ
−
n
p
n
p
(
1
−
p
)
\frac{\displaystyle\sum_{i=1}^nX_i-np}{\sqrt{np(1-p)}}=\frac{n\bar{X}-np}{\sqrt{np(1-p)}}
np(1−p)
i=1∑nXi−np=np(1−p)
nXˉ−np 近似地服从
N
(
0
,
1
)
N(0,1)
N(0,1)分布,于是有
P
{
−
z
1
−
α
/
2
<
n
X
ˉ
−
n
p
n
p
(
1
−
p
)
<
z
1
−
α
/
2
}
=
1
−
α
P\{-z_{1-\alpha/2}<\frac{n\bar{X}-np}{\sqrt{np(1-p)}}<z_{1-\alpha/2}\}=1-\alpha
P{−z1−α/2<np(1−p)
nXˉ−np<z1−α/2}=1−α 而不等式
−
z
1
−
α
/
2
<
n
X
ˉ
−
n
p
n
p
(
1
−
p
)
<
z
1
−
α
/
2
-z_{1-\alpha/2}<\frac{n\bar{X}-np}{\sqrt{np(1-p)}}<z_{1-\alpha/2}
−z1−α/2<np(1−p)
nXˉ−np<z1−α/2 等价于
(
n
+
z
1
−
α
/
2
2
)
p
2
−
(
2
n
X
ˉ
+
z
1
−
α
/
2
2
)
p
+
n
X
ˉ
2
<
0
(n+z_{1-\alpha/2}^2)p^2-(2n\bar{X}+z_{1-\alpha/2}^2)p+n\bar{X}^2<0
(n+z1−α/22)p2−(2nXˉ+z1−α/22)p+nXˉ2<0 记
p
1
=
1
2
(
n
+
z
1
−
α
/
2
2
)
(
2
n
X
ˉ
+
z
1
−
α
/
2
2
−
4
n
X
ˉ
z
1
−
α
/
2
2
(
1
−
X
ˉ
)
+
z
1
−
α
/
2
4
)
=
1
2
a
(
−
b
−
b
2
−
4
a
c
)
p_1=\frac{1}{2(n+z_{1-\alpha/2}^2)}\Big(2n\bar{X}+z_{1-\alpha/2}^2-\sqrt{4n\bar{X}z_{1-\alpha/2}^2(1-\bar{X})+z_{1-\alpha/2}^4}\Big)=\frac{1}{2a}(-b-\sqrt{b^2-4ac})\\
p1=2(n+z1−α/22)1(2nXˉ+z1−α/22−4nXˉz1−α/22(1−Xˉ)+z1−α/24
)=2a1(−b−b2−4ac
)
p
2
=
1
2
(
n
+
z
1
−
α
/
2
2
)
(
2
n
X
ˉ
+
z
1
−
α
/
2
2
+
4
n
X
ˉ
z
1
−
α
/
2
2
(
1
−
X
ˉ
)
+
z
1
−
α
/
2
4
)
=
1
2
a
(
−
b
+
b
2
−
4
a
c
)
p_2=\frac{1}{2(n+z_{1-\alpha/2}^2)}\Big(2n\bar{X}+z_{1-\alpha/2}^2+\sqrt{4n\bar{X}z_{1-\alpha/2}^2(1-\bar{X})+z_{1-\alpha/2}^4}\Big)=\frac{1}{2a}(-b+\sqrt{b^2-4ac})
p2=2(n+z1−α/22)1(2nXˉ+z1−α/22+4nXˉz1−α/22(1−Xˉ)+z1−α/24
)=2a1(−b+b2−4ac
) 此处
a
=
n
+
z
1
−
α
/
2
2
,
b
=
−
(
2
n
X
ˉ
+
z
1
−
α
/
2
2
)
,
c
=
n
X
ˉ
2
a=n+z_{1-\alpha/2}^2,b=-(2n\bar{X}+z_{1-\alpha/2}^2),c=n\bar{X}^2
a=n+z1−α/22,b=−(2nXˉ+z1−α/22),c=nXˉ2,
又由题目可知 α = 0.05 , X ˉ = 0.3 \alpha=0.05,\bar{X}=0.3 α=0.05,Xˉ=0.3,则 z 1 − α / 2 = 1.96 z_{1-\alpha/2}=1.96 z1−α/2=1.96
因此,可计算得 a = 123.84 , b = − 75.84 , c = 10.8 a=123.84,b=-75.84,c=10.8 a=123.84,b=−75.84,c=10.8,
于是
p
1
=
1
2
a
(
−
b
−
b
2
−
4
a
c
)
=
0.225
p_1=\frac{1}{2a}(-b-\sqrt{b^2-4ac})=0.225\\
p1=2a1(−b−b2−4ac
)=0.225
p
2
=
1
2
a
(
−
b
+
b
2
−
4
a
c
)
=
0.387
p_2=\frac{1}{2a}(-b+\sqrt{b^2-4ac})=0.387
p2=2a1(−b+b2−4ac
)=0.387 故得
P
(
A
)
P(A)
P(A)的一个置信水平为0.95的近似置信区间为
(
0.225
,
0.387
)
(0.225,0.387)
(0.225,0.387)
另外,不等式也可以解为
P
L
=
1
1
+
z
1
−
α
/
2
2
/
n
(
X
ˉ
+
z
1
−
α
/
2
2
2
n
−
X
ˉ
(
1
−
X
ˉ
)
z
1
−
α
/
2
2
n
+
)
P_L=\frac{1}{1+z_{1-\alpha/2}^2/n}\Big(\bar{X}+\frac{z_{1-\alpha/2}^2}{2n}-\sqrt{\frac{\bar{X}(1-\bar{X})z_{1-\alpha/2}^2}{n}+\Big)}
PL=1+z1−α/22/n1(Xˉ+2nz1−α/22−nXˉ(1−Xˉ)z1−α/22+)
Q3: Let X 1 , … , X n X_1,\dots,X_n X1,…,Xn be an iid sample from a distribution with CDF F ( x ) F(x) F(x).
(a) Show that the empirical CDF F ^ n ( x ) \hat F_n(x) F^n(x) is an unbiased estimate of F ( x ) F(x) F(x) for any fixed x ∈ R x\in\mathbb{R} x∈R.
(b) Find the variance of F ^ n ( x ) \hat F_n(x) F^n(x).
© Now suppose that F ( x ) = 1 − exp ( − λ x ) F(x)=1-\exp(-\lambda x) F(x)=1−exp(−λx) for x > 0 x>0 x>0 and 0 0 0 otherwise. Inspecting whether the variance of F ^ n ( x ) \hat F_n(x) F^n(x) attains the lower bound of Cramer-Rao inequality for estimating F ( x ) F(x) F(x) with fixed x > 0 x>0 x>0. (In fact, there exists a better unbiased estimator for F ( x ) F(x) F(x) than the empirical CDF for this case.)
Q4: True or false, and state why:
- The significance level of a statistical test is equal to the probability that the
null hypothesis is true. - If the significance level of a test is decreased, the power of the test would be expected to
increase. - The probability that the null hypothesis is falsely rejected is equal to the power
of the test. - A type I error occurs when the test statistic falls in the rejection region of the
test.
Q5: A coin is thrown independently 10 times to test the hypothesis that the probability of heads is
1
/
2
1/2
1/2 versus the alternative that the probability is not
1
/
2
1/2
1/2. The test rejects
if either 0 or 10 heads are observed.
- What is the significance level of the test?
- If in fact the probability of heads is 0.1 0.1 0.1, what is the power of the test?
Q6: Suppose that
X
1
,
X
2
,
X
3
X_1,X_2,X_3
X1,X2,X3 are samples of Bernoulli
B
(
1
,
p
)
B(1,p)
B(1,p) population. For testing the hypothesis
H
0
:
p
=
1
/
2
v
s
.
H
1
:
p
=
3
/
4
H_0:p=1/2\ vs.\ H_1:p=3/4
H0:p=1/2 vs. H1:p=3/4, we use a rejection region:
W
=
{
(
x
1
,
x
2
,
x
3
)
:
x
1
+
x
2
+
x
3
≥
2
}
.
W=\{(x_1,x_2,x_3):x_1+x_2+x_3\ge 2\}.
W={(x1,x2,x3):x1+x2+x3≥2}.
- What are the probabilities of the two types of errors for W W W?
- What is the power of the test?