漫谈:Chebyshev多项式,Krylov子空间,Chebyshev迭代,共轭梯度方法
Chebyshev迭代和共轭梯度方法的收敛速度(后者或称误差分析)都与Chebyshev多项式有着紧密联系,因此做一些整理,以期把其中的逻辑理清,推导理顺。只覆盖关键要点,不求面面俱到。
Chebyshev 多项式
标准Chebyshev多项式
与权函数
ρ
(
y
)
=
1
1
−
y
2
\rho(y)=\frac{1}{\sqrt{1-y^2}}
ρ(y)=1−y2
1对应的正交多项式为Chebyshev多项式。标准的Chebyshev多项式分两段定义:
T
n
(
y
)
=
{
c
o
s
(
n
arccos
(
y
)
)
,
y
∈
[
−
1
,
1
]
1
2
[
(
y
+
y
2
−
1
)
n
+
(
y
−
y
2
−
1
)
n
]
,
y
∈
R
∖
[
−
1
,
1
]
T_n(y)=\left\{ \begin{aligned} &\mathrm{cos}(n\space\arccos(y)) &\space, y\in [-1,1]\\ &\frac{1}{2}\left[\left(y+\sqrt{y^2-1}\right)^n+\left(y-\sqrt{y^2-1}\right)^n\right] &\space, y \in \mathbf{R} \setminus [-1,1] \end{aligned} \right.
Tn(y)=⎩⎨⎧cos(n arccos(y))21[(y+y2−1
)n+(y−y2−1
)n] ,y∈[−1,1] ,y∈R∖[−1,1]
概述一些基本性质。 n n n次的Chebyshev多项式在开区间 ( − 1 , 1 ) (-1,1) (−1,1)内有 n n n个零点,在 [ − 1 , 1 ] [-1,1] [−1,1]内有 n + 1 n+1 n+1个符号交错的极值点( − 1 -1 −1和 + 1 +1 +1交替)。Chebyshev多项式不是奇函数就是偶函数,且随 n n n交替变化。
对于任意位置区间
[
a
,
b
]
[a,b]
[a,b]的Chebyshev多项式,都可以通过平移和缩放来化为标准区间
[
−
1
,
1
]
[-1,1]
[−1,1]上的标准Chebyshev多项式。对于
x
∈
[
a
,
b
]
x\in[a,b]
x∈[a,b],做变换
y
=
x
−
1
2
(
b
+
a
)
1
2
(
b
−
a
)
y=\frac{x-\frac{1}{2}(b+a)}{\frac{1}{2}(b-a)}
y=21(b−a)x−21(b+a),即有
y
∈
[
−
1
,
1
]
y\in[-1,1]
y∈[−1,1]:
T
n
~
(
x
)
=
T
n
(
y
=
x
−
1
2
(
b
+
a
)
1
2
(
b
−
a
)
)
\widetilde{T_n}(x)=T_n\left(y=\frac{x-\frac{1}{2}(b+a)}{\frac{1}{2}(b-a)}\right)
Tn
(x)=Tn(y=21(b−a)x−21(b+a))
相应地,权函数也做变换:
ρ
~
(
x
)
=
ρ
(
y
=
x
−
1
2
(
b
+
a
)
1
2
(
b
−
a
)
)
=
1
1
−
(
x
−
1
2
(
b
+
a
)
1
2
(
b
−
a
)
)
2
\widetilde{\rho}(x)=\rho\left(y=\frac{x-\frac{1}{2}(b+a)}{\frac{1}{2}(b-a)}\right)= \frac{1}{\sqrt{1-\left( \frac{x-\frac{1}{2}(b+a)}{\frac{1}{2}(b-a)} \right)^2}}
ρ
(x)=ρ(y=21(b−a)x−21(b+a))=1−(21(b−a)x−21(b+a))2
1
三项递推关系
记
θ
=
a
r
c
c
o
s
y
\theta=\mathrm{arccos}y
θ=arccosy,利用和差化积公式有
cos
(
(
n
+
1
)
θ
)
+
cos
(
(
n
−
1
)
θ
)
=
2
cos
(
n
θ
)
cos
(
θ
)
\cos((n+1)\theta)+\cos((n-1)\theta)=2\cos(n\theta)\cos(\theta)
cos((n+1)θ)+cos((n−1)θ)=2cos(nθ)cos(θ)
故有
T
n
+
1
(
y
)
=
2
y
T
n
(
y
)
−
T
n
−
1
(
y
)
T_{n+1}(y)=2yT_n(y)-T_{n-1}(y)
Tn+1(y)=2yTn(y)−Tn−1(y)
逼近性质
对之前提到的定义在
x
∈
[
a
,
b
]
x\in[a,b]
x∈[a,b]上的Chebyshev多项式,按照区间外一点
c
∉
[
a
,
b
]
c\notin[a,b]
c∈/[a,b]来做归一化:
T
n
^
(
x
)
=
T
n
~
(
x
)
T
n
~
(
c
)
\widehat{T_n}(x)=\frac{\widetilde{T_n}(x)}{\widetilde{T_n}(c)}
Tn
(x)=Tn
(c)Tn
(x)
显然有
T
n
^
(
x
)
∈
Φ
n
,
c
=
{
f
∈
P
∣
f
(
c
)
=
1
}
\widehat{T_n}(x)\in\Phi_{n,c}=\{f\in\mathbf{P}\space|\space f(c)=1 \}
Tn
(x)∈Φn,c={f∈P ∣ f(c)=1}。后者表示所有在
c
c
c这一点值为1的多项式的集合。Chebyshev多项式具有如下的逼近性质。
∣
∣
T
n
^
(
x
)
∣
∣
C
[
a
,
b
]
=
inf
f
∈
Φ
n
,
c
∣
∣
f
∣
∣
C
[
a
,
b
]
||\widehat{T_n}(x)||_{C[a,b]}={\underset {f\in\Phi_{n,c}}{\operatorname {inf} }} ||f||_{C[a,b]}
∣∣Tn
(x)∣∣C[a,b]=f∈Φn,cinf∣∣f∣∣C[a,b]
即所有在
c
c
c这一点值为1的多项式里,Chebyshev多项式是取到最小范数的那一个。由有限维空间内范数的等价性,从无穷范数入手证明。
首先说明 T n ^ ( x ) \widehat{T_n}(x) Tn (x)是well-defined的。因为 T n ( y ) T_n(y) Tn(y)的零点都在开区间 ( − 1 , 1 ) (-1,1) (−1,1)内,因此平移和缩放后 T n ~ ( x ) \widetilde{T_n}(x) Tn (x)的零点都在 [ a , b ] [a,b] [a,b]内, T n ~ ( c ) ≠ 0 \widetilde{T_n}(c)\neq0 Tn (c)=0。
反证法。假设 Y n ∈ Φ n , c Y_n\in\Phi_{n,c} Yn∈Φn,c,且满足 ∣ ∣ Y n ∣ ∣ C [ a , b ] < ∣ ∣ T n ^ ∣ ∣ C [ a , b ] ||Y_n||_{C[a,b]}<||\widehat{T_n}||_{C[a,b]} ∣∣Yn∣∣C[a,b]<∣∣Tn ∣∣C[a,b]。则对于 z n ( x ) = T n ^ ( x ) − Y n ( x ) z_n(x)=\widehat{T_n}(x)-Y_n(x) zn(x)=Tn (x)−Yn(x),它有一个零点是 c c c。
对于标准Chebyshev多项式
T
n
y
T_n{y}
Tny,在
[
−
1
,
1
]
[-1,1]
[−1,1]上的(交错)极值点为
cos
(
k
π
n
)
,
k
=
0
,
1
,
.
.
.
,
n
\cos(\frac{k\pi}{n}), k=0,1,...,n
cos(nkπ),k=0,1,...,n。所以
T
n
~
(
x
)
\widetilde{T_n}(x)
Tn
(x)在
[
a
,
b
]
[a,b]
[a,b]上的(交错)极值点为
x
k
=
b
+
a
2
+
b
−
a
2
cos
(
k
π
n
)
,
k
=
0
,
1
,
.
.
.
,
n
x_k=\frac{b+a}{2}+\frac{b-a}{2}\cos(\frac{k\pi}{n}), k=0,1,...,n
xk=2b+a+2b−acos(nkπ),k=0,1,...,n。因为
∣
∣
Y
n
(
x
)
∣
∣
C
[
a
,
b
]
<
∣
∣
T
n
^
(
x
)
∣
∣
C
[
a
,
b
]
||Y_n(x)||_{C[a,b]}<||\widehat{T_n}(x)||_{C[a,b]}
∣∣Yn(x)∣∣C[a,b]<∣∣Tn
(x)∣∣C[a,b],那么自然有
∣
Y
n
(
x
k
)
∣
≤
∣
∣
Y
n
(
x
)
∣
∣
C
[
a
,
b
]
<
∣
∣
T
n
^
(
x
)
∣
∣
C
[
a
,
b
]
=
∣
T
n
^
(
x
k
)
∣
,
k
=
0
,
1
,
.
.
.
,
n
|Y_n(x_k)|\leq||Y_n(x)||_{C[a,b]}<||\widehat{T_n}(x)||_{C[a,b]}=|\widehat{T_n}(x_k)|\space, k=0,1,...,n
∣Yn(xk)∣≤∣∣Yn(x)∣∣C[a,b]<∣∣Tn
(x)∣∣C[a,b]=∣Tn
(xk)∣ ,k=0,1,...,n
所以
T
n
^
(
x
k
)
\widehat{T_n}(x_k)
Tn
(xk)与
T
n
^
(
x
k
)
−
Y
n
(
x
k
)
\widehat{T_n}(x_k)-Y_n(x_k)
Tn
(xk)−Yn(xk)同号!而
{
x
k
}
\{x_k\}
{xk}是
T
n
^
(
x
)
\widehat{T_n}(x)
Tn
(x)的一组符号交错点,那么也是
T
n
^
(
x
)
−
Y
n
(
x
)
\widehat{T_n}(x)-Y_n(x)
Tn
(x)−Yn(x)的一组符号交错点。所以在开区间
(
x
k
,
x
k
+
1
)
(x_k,x_{k+1})
(xk,xk+1)中必至少有
z
n
(
x
)
z_n(x)
zn(x)的一个零点,则
(
a
,
b
)
(a,b)
(a,b)内至少有
z
n
(
x
)
z_n(x)
zn(x)的
n
n
n个不同零点。
加上前述的 c ∉ [ a , b ] c\notin[a,b] c∈/[a,b]也是 z n ( x ) z_n(x) zn(x)的一个零点,故它至少有 n + 1 n+1 n+1个零点,这与它是 n n n次多项式矛盾。证毕。
再来估计
∣
∣
T
n
^
(
x
)
∣
∣
C
[
a
,
b
]
=
inf
f
∈
Φ
n
,
c
∣
∣
f
∣
∣
C
[
a
,
b
]
||\widehat{T_n}(x)||_{C[a,b]}={\underset {f\in\Phi_{n,c}}{\operatorname {inf} }} ||f||_{C[a,b]}
∣∣Tn
(x)∣∣C[a,b]=f∈Φn,cinf∣∣f∣∣C[a,b]的大小。
∣
∣
T
n
^
(
x
)
∣
∣
C
[
a
,
b
]
=
∣
∣
T
n
~
(
x
)
T
n
~
(
c
)
∣
∣
C
[
a
,
b
]
=
∣
∣
T
n
~
(
x
)
∣
∣
C
[
a
,
b
]
∣
T
n
~
(
c
)
∣
=
1
∣
T
n
~
(
c
)
∣
||\widehat{T_n}(x)||_{C[a,b]}=\left|\left|\frac{\widetilde{T_n}(x)}{\widetilde{T_n}(c)}\right|\right|_{C[a,b]}= \frac{||\widetilde{T_n}(x)||_{C[a,b]}}{|\widetilde{T_n}(c)|}= \frac{1}{|\widetilde{T_n}(c)|}
∣∣Tn
(x)∣∣C[a,b]=∣∣∣∣∣∣∣∣∣∣Tn
(c)Tn
(x)∣∣∣∣∣∣∣∣∣∣C[a,b]=∣Tn
(c)∣∣∣Tn
(x)∣∣C[a,b]=∣Tn
(c)∣1
而
T
n
~
(
c
)
=
T
n
(
c
−
b
+
a
2
b
−
a
2
)
=
T
n
(
−
(
b
−
c
)
+
(
a
−
c
)
2
(
b
−
c
)
−
(
a
−
c
)
2
)
=
(
−
1
)
n
T
n
(
(
b
−
c
)
+
(
a
−
c
)
(
b
−
c
)
−
(
a
−
c
)
)
\widetilde{T_n}(c)= T_n\left(\frac{c-\frac{b+a}{2}}{\frac{b-a}{2}} \right)= T_n\left(\frac{-\frac{(b-c)+(a-c)}{2}}{\frac{(b-c)-(a-c)}{2}} \right)= (-1)^nT_n\left( \frac{(b-c)+(a-c)}{(b-c)-(a-c)} \right)
Tn
(c)=Tn(2b−ac−2b+a)=Tn(2(b−c)−(a−c)−2(b−c)+(a−c))=(−1)nTn((b−c)−(a−c)(b−c)+(a−c))
所以
∣
T
n
~
(
c
)
∣
=
T
n
(
∣
(
b
−
c
)
+
(
a
−
c
)
(
b
−
c
)
−
(
a
−
c
)
∣
)
=
T
n
(
λ
+
1
λ
−
1
)
|\widetilde{T_n}(c)|= T_n\left(\left| \frac{(b-c)+(a-c)}{(b-c)-(a-c)} \right|\right)= T_n\left(\frac{\lambda+1}{\lambda-1} \right)
∣Tn
(c)∣=Tn(∣∣∣∣(b−c)−(a−c)(b−c)+(a−c)∣∣∣∣)=Tn(λ−1λ+1)
其中
λ
=
max
(
∣
a
−
c
∣
∣
b
−
c
∣
,
∣
b
−
c
∣
∣
a
−
c
∣
)
>
1
\lambda=\max(\frac{|a-c|}{|b-c|},\frac{|b-c|}{|a-c|})>1
λ=max(∣b−c∣∣a−c∣,∣a−c∣∣b−c∣)>1。所以根据标准Chebyshev多项式在
[
−
1
,
1
]
[-1,1]
[−1,1]区间外的表达式,有
T
n
(
λ
+
1
λ
−
1
)
≥
1
2
[
λ
+
1
λ
−
1
+
(
λ
+
1
λ
−
1
)
2
−
1
]
n
=
1
2
(
λ
+
1
λ
−
1
)
n
T_n\left(\frac{\lambda+1}{\lambda-1} \right)\geq \frac{1}{2}\left[ \frac{\lambda+1}{\lambda-1}+\sqrt{ \left(\frac{\lambda+1}{\lambda-1}\right)^2-1 } \right]^n= \frac{1}{2}\left(\frac{\sqrt{\lambda}+1}{\sqrt{\lambda}-1}\right)^n
Tn(λ−1λ+1)≥21⎣⎡λ−1λ+1+(λ−1λ+1)2−1
⎦⎤n=21(λ
−1λ
+1)n
所以最后得到的估计是
∣
∣
T
n
^
(
x
)
∣
∣
C
[
a
,
b
]
=
1
∣
T
n
~
(
c
)
∣
=
1
T
n
(
λ
+
1
λ
−
1
)
≤
2
(
λ
−
1
λ
+
1
)
n
||\widehat{T_n}(x)||_{C[a,b]}= \frac{1}{|\widetilde{T_n}(c)|}= \frac{1}{T_n\left(\frac{\lambda+1}{\lambda-1} \right)}\leq 2\left(\frac{\sqrt{\lambda}-1}{\sqrt{\lambda}+1}\right)^n
∣∣Tn
(x)∣∣C[a,b]=∣Tn
(c)∣1=Tn(λ−1λ+1)1≤2(λ
+1λ
−1)n
并且这个估计是较紧凑的。将会在Chebyshev迭代和共轭梯度的收敛速度分析中用到这个结论。
Chebyshev迭代
Chebyshev迭代是线性非定常迭代,其迭代使用的矩阵/系数在每个迭代步都会发生变化。而任意的单步线性定常迭代都可以表述为最简单的Richardson迭代+预处理的组合。因此从Richardson迭代出发,引入Krylov子空间的概念,然后加以适当改造得到Chebyshev迭代。
Richardson迭代和Krylov子空间
Richardson迭代是最简单的迭代格式,它在每一步都做一个修正量,此修正量就简单地取为当前迭代结果的残差:
x
(
i
+
1
)
=
x
(
i
)
+
r
(
i
)
=
x
(
i
)
+
b
−
A
x
(
i
)
=
(
I
−
A
)
x
(
i
)
+
b
\bm{x}^{(i+1)}=\bm{x}^{(i)}+\bm{r}^{(i)}=\bm{x}^{(i)}+\bm{b}-\bm{A}\bm{x}^{(i)}=(\bm{I}-\bm{A})\bm{x}^{(i)}+\bm{b}
x(i+1)=x(i)+r(i)=x(i)+b−Ax(i)=(I−A)x(i)+b
r
(
i
+
1
)
=
b
−
A
x
(
i
+
1
)
=
b
−
A
x
(
i
)
−
A
r
(
i
)
=
(
I
−
A
)
r
(
i
)
=
.
.
.
=
(
I
−
A
)
i
+
1
r
(
0
)
\bm{r}^{(i+1)}=\bm{b}-\bm{A}\bm{x}^{(i+1)}=\bm{b}-\bm{A}\bm{x}^{(i)}-\bm{A}\bm{r}^{(i)} =(\bm{I}-\bm{A})\bm{r}^{(i)}=...=(\bm{I}-\bm{A})^{i+1}\bm{r}^{(0)}
r(i+1)=b−Ax(i+1)=b−Ax(i)−Ar(i)=(I−A)r(i)=...=(I−A)i+1r(0)
由残差与误差的关系有
e
(
i
)
=
A
−
1
r
(
i
)
\bm{e}^{(i)}=\bm{A}^{-1}\bm{r}^{(i)}
e(i)=A−1r(i),可知相应有
e
(
i
+
1
)
=
(
I
−
A
)
e
(
i
)
\bm{e}^{(i+1)}=(\bm{I}-\bm{A})\bm{e}^{(i)}
e(i+1)=(I−A)e(i)。于是在第
m
m
m步得到的迭代结果为
x
(
m
)
=
x
(
0
)
+
∑
i
=
0
m
−
1
r
(
i
)
=
x
(
0
)
+
∑
i
=
0
m
−
1
(
I
−
A
)
i
r
(
0
)
∈
x
(
0
)
+
K
m
(
r
(
0
)
)
\bm{x}^{(m)}=\bm{x}^{(0)}+\sum_{i=0}^{m-1}\bm{r}^{(i)}=\bm{x}^{(0)}+\sum_{i=0}^{m-1}(\bm{I}-\bm{A})^{i}\bm{r}^{(0)}\in\bm{x}^{(0)}+\bm{K_m}(\bm{r}^{(0)})
x(m)=x(0)+i=0∑m−1r(i)=x(0)+i=0∑m−1(I−A)ir(0)∈x(0)+Km(r(0))
其中
K
m
(
r
(
0
)
)
\bm{K_m}(\bm{r}^{(0)})
Km(r(0))是由
r
(
0
)
\bm{r}^{(0)}
r(0)生成的Krylov子空间,其中的元素是用
A
\bm{A}
A反复作用于
r
(
0
)
\bm{r}^{(0)}
r(0)得到的
r
(
0
)
,
A
r
(
0
)
,
A
2
r
(
0
)
,
.
.
.
\bm{r}^{(0)},\bm{Ar}^{(0)},\bm{A^2r}^{(0)},...
r(0),Ar(0),A2r(0),...等的线性组合。
任何的单步定常迭代格式,比如给定
A
=
M
−
N
\bm{A}=\bm{M}-\bm{N}
A=M−N,其中
M
\bm{M}
M为预处理矩阵,则迭代可以写成
x
(
i
+
1
)
=
M
−
1
N
x
(
i
)
+
M
−
1
b
=
M
−
1
(
M
−
A
)
x
(
i
)
+
M
−
1
b
\bm{x}^{(i+1)}=\bm{M}^{-1}\bm{N}\bm{x}^{(i)}+\bm{M}^{-1}\bm{b}=\bm{M}^{-1}(\bm{M}-\bm{A})\bm{x}^{(i)}+\bm{M}^{-1}\bm{b}
x(i+1)=M−1Nx(i)+M−1b=M−1(M−A)x(i)+M−1b
而对
A
x
=
b
\bm{Ax}=\bm{b}
Ax=b做预处理后得到的
M
−
1
A
x
=
M
−
1
b
\bm{M^{-1}Ax}=\bm{M^{-1}b}
M−1Ax=M−1b,再进行Richardson迭代,可以得到
x
(
i
+
1
)
=
x
(
i
)
+
r
(
i
)
=
x
(
i
)
+
M
−
1
b
−
M
−
1
A
x
(
i
)
\bm{x}^{(i+1)}=\bm{x}^{(i)}+\bm{r}^{(i)}=\bm{x}^{(i)}+\bm{M^{-1}b}-\bm{M^{-1}}\bm{A}\bm{x}^{(i)}
x(i+1)=x(i)+r(i)=x(i)+M−1b−M−1Ax(i)
可见是等价的,所以任何的单步定常迭代形式都可以写成预处理后的Richardson迭代(即当前迭代值+修正量)的形式。
单步迭代格式
原始的Richardson迭代是最简单地取了
M
=
I
\bm{M}=\bm{I}
M=I(最好计算,但最不好近似)来做预处理。现在采用非定常,即考虑每步迭代不那么naive,而是选用一个与当前步
i
i
i有关地矩阵
M
i
\bm{M_i}
Mi做预处理,假设
M
i
=
τ
i
M
\bm{M_i}=\tau_i\bm{M}
Mi=τiM由同一个矩阵
M
\bm{M}
M生成。同样类似Richardson迭代的好计算的想法,取
M
=
I
\bm{M}=\bm{I}
M=I,故有
M
i
=
τ
i
I
\bm{M_i}=\tau_i\bm{I}
Mi=τiI。所以残量的递推关系成为
r
(
i
+
1
)
=
(
I
−
τ
i
A
)
r
(
i
)
=
(
I
−
τ
i
A
)
(
I
−
τ
i
−
1
A
)
.
.
.
(
I
−
τ
0
A
)
r
(
0
)
\bm{r}^{(i+1)}=(\bm{I}-\tau_i\bm{A})\bm{r}^{(i)}=(\bm{I}-\tau_i\bm{A})(\bm{I}-\tau_{i-1}\bm{A})...(\bm{I}-\tau_0\bm{A})\bm{r}^{(0)}
r(i+1)=(I−τiA)r(i)=(I−τiA)(I−τi−1A)...(I−τ0A)r(0)
对于误差
e
(
i
)
\bm{e}^{(i)}
e(i)也同样有此关系,所以有
∣
∣
e
(
i
)
∣
∣
∣
∣
e
(
0
)
∣
∣
≤
∣
∣
(
I
−
τ
i
−
1
A
)
.
.
.
(
I
−
τ
0
A
)
∣
∣
\frac{||\bm{e}^{(i)}||}{||\bm{e}^{(0)}||}\leq|| (\bm{I}-\tau_{i-1}\bm{A})...(\bm{I}-\tau_0\bm{A}) ||
∣∣e(0)∣∣∣∣e(i)∣∣≤∣∣(I−τi−1A)...(I−τ0A)∣∣
我们希望构造出来的结果(即
τ
0
,
τ
1
,
.
.
.
\tau_0,\tau_1,...
τ0,τ1,...等的取值)能使残量(误差)收缩得最小,即求解如下的优化问题:
inf
τ
0
,
.
.
.
,
τ
i
−
1
∣
∣
(
I
−
τ
i
−
1
A
)
.
.
.
(
I
−
τ
0
A
)
∣
∣
{\underset {\tau_0,...,\tau_{i-1}} {\operatorname {inf} }} || (\bm{I}-\tau_{i-1}\bm{A})...(\bm{I}-\tau_0\bm{A}) ||
τ0,...,τi−1inf∣∣(I−τi−1A)...(I−τ0A)∣∣
假设
A
\bm{A}
A是对称阵(这些强加的适用条件后面汇总来看),那么存在正交阵
P
\bm{P}
P(如果
A
\bm{A}
A是Hermite矩阵,则
P
\bm{P}
P为酉矩阵)使得
A
\bm{A}
A能相似对角化,且取2-范数时有
∣
∣
P
∣
∣
2
=
1
||\bm{P}||_2=1
∣∣P∣∣2=1。
A
=
P
d
i
a
g
(
λ
1
,
λ
2
,
.
.
.
,
λ
n
)
P
T
=
P
Λ
P
T
\bm{A}=\bm{P} \mathrm{diag}(\lambda_1,\lambda_2,...,\lambda_n)\bm{P}^T=\bm{P}\bm{\Lambda} \bm{P}^T
A=Pdiag(λ1,λ2,...,λn)PT=PΛPT
那么有
∣
∣
(
I
−
τ
i
−
1
A
)
.
.
.
(
I
−
τ
0
A
)
∣
∣
2
=
∣
∣
P
(
I
−
τ
i
−
1
Λ
)
.
.
.
(
I
−
τ
0
Λ
)
P
T
∣
∣
2
=
∣
∣
(
I
−
τ
i
−
1
Λ
)
.
.
.
(
I
−
τ
0
Λ
)
∣
∣
2
=
max
λ
∈
σ
(
A
)
∣
(
1
−
τ
i
−
1
λ
)
.
.
.
(
1
−
τ
0
λ
)
∣
\begin{aligned} || (\bm{I}-\tau_{i-1}\bm{A})...(\bm{I}-\tau_0\bm{A}) ||_2=& ||\bm{P}(\bm{I}-\tau_{i-1}\bm{\Lambda})...(\bm{I}-\tau_0\bm{\Lambda})\bm{P}^T||_2\\ =&||(\bm{I}-\tau_{i-1}\bm{\Lambda})...(\bm{I}-\tau_0\bm{\Lambda})||_2\\ =&{\underset {\lambda\in\sigma(\bm{A})} {\operatorname {max} }} | (1-\tau_{i-1}\lambda)...(1-\tau_0\lambda)| \\ \end{aligned}
∣∣(I−τi−1A)...(I−τ0A)∣∣2===∣∣P(I−τi−1Λ)...(I−τ0Λ)PT∣∣2∣∣(I−τi−1Λ)...(I−τ0Λ)∣∣2λ∈σ(A)max∣(1−τi−1λ)...(1−τ0λ)∣
而
A
\bm{A}
A的谱是可以估计的,假设已算出
σ
(
A
)
∈
[
λ
‾
,
λ
ˉ
]
\sigma(\bm{A})\in[\underline{\lambda},\bar{\lambda}]
σ(A)∈[λ,λˉ]。那么上式可以写成:
max
λ
∈
σ
(
A
)
∣
(
1
−
τ
i
−
1
λ
)
.
.
.
(
1
−
τ
0
λ
)
∣
≤
∣
∣
(
1
−
τ
i
−
1
λ
)
.
.
.
(
1
−
τ
0
λ
)
∣
∣
∞
,
[
λ
‾
,
λ
ˉ
]
=
∣
∣
f
(
λ
)
∣
∣
∞
,
[
λ
‾
,
λ
ˉ
]
\begin{aligned} {\underset {\lambda\in\sigma(\bm{A})} {\operatorname {max} }} | (1-\tau_{i-1}\lambda)...(1-\tau_0\lambda)|&\leq& ||(1-\tau_{i-1}\lambda)...(1-\tau_0\lambda)||_{\infty,[\underline{\lambda},\bar{\lambda}]}\\ &=&||f(\lambda)||_{\infty,[\underline{\lambda},\bar{\lambda}]}\\ \end{aligned}
λ∈σ(A)max∣(1−τi−1λ)...(1−τ0λ)∣≤=∣∣(1−τi−1λ)...(1−τ0λ)∣∣∞,[λ,λˉ]∣∣f(λ)∣∣∞,[λ,λˉ]
左边的原式是一个绝对值,现在右边看成是一个关于
λ
\lambda
λ的多项式函数
f
(
λ
)
=
(
1
−
τ
i
−
1
λ
)
.
.
.
(
1
−
τ
0
λ
)
f(\lambda)=(1-\tau_{i-1}\lambda)...(1-\tau_0\lambda)
f(λ)=(1−τi−1λ)...(1−τ0λ)的无穷范数。该多项式函数有
i
i
i个实单根:
1
τ
0
,
.
.
.
,
1
τ
i
−
1
\frac{1}{\tau_0},...,\frac{1}{\tau_{i-1}}
τ01,...,τi−11。
而由之前的结论,定义在 [ λ ‾ , λ ˉ ] [\underline{\lambda},\bar{\lambda}] [λ,λˉ]上的Chebyshev多项式 T i ^ ( λ ) \widehat{T_i}(\lambda) Ti (λ)在开区间 ( λ ‾ , λ ˉ ) (\underline{\lambda},\bar{\lambda}) (λ,λˉ)内有 i i i个实单根,且 T i ^ ( λ ) = inf P i ∈ P i , P i ( 0 ) = 1 ∣ ∣ P i ∣ ∣ ∞ , [ λ ‾ , λ ˉ ] \widehat{T_i}(\lambda)={\underset {P_i\in\mathbf{P_i},P_i(0)=1}{\operatorname {inf} }} ||P_i||_{\infty,[\underline{\lambda},\bar{\lambda}]} Ti (λ)=Pi∈Pi,Pi(0)=1inf∣∣Pi∣∣∞,[λ,λˉ].
注意当
A
\bm{A}
A为对称正定阵时,
0
∉
[
λ
‾
,
λ
ˉ
]
0\notin[\underline{\lambda},\bar{\lambda}]
0∈/[λ,λˉ],所以
inf
P
i
∈
P
i
,
P
i
(
0
)
=
1
∣
∣
P
i
∣
∣
∞
,
[
λ
‾
,
λ
ˉ
]
{\underset {P_i\in\mathbf{P_i},P_i(0)=1}{\operatorname {inf} }} ||P_i||_{\infty,[\underline{\lambda},\bar{\lambda}]}
Pi∈Pi,Pi(0)=1inf∣∣Pi∣∣∞,[λ,λˉ]的下界恰好能被原问题的
f
(
λ
)
f(\lambda)
f(λ)取到。此时
P
i
∗
(
λ
)
=
T
i
^
(
λ
)
=
T
i
~
(
λ
)
T
i
~
(
0
)
=
T
i
(
λ
−
1
2
(
λ
ˉ
+
λ
‾
)
1
2
(
λ
ˉ
−
λ
‾
)
)
T
i
(
0
−
1
2
(
λ
ˉ
+
λ
‾
)
1
2
(
λ
ˉ
−
λ
‾
)
)
=
(
1
−
τ
i
−
1
(
i
)
λ
)
.
.
.
(
1
−
τ
0
(
i
)
λ
)
P_i^*(\lambda)=\widehat{T_i}(\lambda)=\frac{\widetilde{T_i}(\lambda)}{\widetilde{T_i}(0)} =\frac{T_i\left( \frac{\lambda-\frac{1}{2}(\bar{\lambda}+\underline{\lambda})} {\frac{1}{2} (\bar{\lambda}-\underline{\lambda})} \right)}{T_i\left( \frac{0-\frac{1}{2}(\bar{\lambda}+\underline{\lambda})} {\frac{1}{2} (\bar{\lambda}-\underline{\lambda})} \right)}= (1-\tau_{i-1}^{(i)}\lambda)...(1-\tau_{0}^{(i)}\lambda)
Pi∗(λ)=Ti
(λ)=Ti
(0)Ti
(λ)=Ti(21(λˉ−λ)0−21(λˉ+λ))Ti(21(λˉ−λ)λ−21(λˉ+λ))=(1−τi−1(i)λ)...(1−τ0(i)λ)
其中
τ
j
(
i
)
\tau_j^{(i)}
τj(i)的上标表示与迭代次数
i
i
i有关。所以原问题的
i
i
i个实单根就是定义在
[
λ
‾
,
λ
ˉ
]
[\underline{\lambda},\bar{\lambda}]
[λ,λˉ]上的Chebyshev多项式的根。单步的迭代格式写成
x
(
1
)
=
x
(
0
)
+
τ
0
(
i
)
r
(
0
)
.
.
.
x
(
j
)
=
x
(
j
−
1
)
+
τ
j
−
1
(
i
)
r
(
j
−
1
)
.
.
.
x
(
i
)
=
x
(
i
−
1
)
+
τ
i
−
1
(
i
)
r
(
i
−
1
)
\begin{aligned} \bm{x}^{(1)}&=&\bm{x}^{(0)}+\tau_0^{(i)}\bm{r}^{(0)}\\ &...&\\ \bm{x}^{(j)}&=&\bm{x}^{(j-1)}+\tau_{j-1}^{(i)}\bm{r}^{(j-1)}\\ &...&\\ \bm{x}^{(i)}&=&\bm{x}^{(i-1)}+\tau_{i-1}^{(i)}\bm{r}^{(i-1)}\\ \end{aligned}
x(1)x(j)x(i)=...=...=x(0)+τ0(i)r(0)x(j−1)+τj−1(i)r(j−1)x(i−1)+τi−1(i)r(i−1)
收敛速度
Chebyshev迭代的收敛速度优势,应当通过进行一次 i i i步的Chebyshev迭代,和进行 i i i次的单步定常迭代来比较。
由Chebyshev的逼近性质,可以得到Chebyshev的迭代收敛速度为
∣
∣
e
(
i
)
∣
∣
∣
∣
e
(
0
)
∣
∣
≤
∣
∣
(
1
−
τ
i
−
1
λ
)
.
.
.
(
1
−
τ
0
λ
)
∣
∣
∞
,
[
λ
‾
,
λ
ˉ
]
=
1
∣
T
i
~
(
0
)
∣
=
1
∣
T
i
(
0
−
λ
ˉ
−
λ
‾
2
λ
ˉ
−
λ
‾
2
)
∣
=
1
∣
T
i
(
λ
ˉ
λ
‾
+
1
λ
ˉ
λ
‾
−
1
)
∣
=
1
∣
T
i
(
α
+
1
α
−
1
)
∣
≤
≈
2
(
α
−
1
α
+
1
)
i
\begin{aligned} \frac{||\bm{e}^{(i)}||}{||\bm{e}^{(0)||}}&\leq&||(1-\tau_{i-1}\lambda)...(1-\tau_0\lambda)||_{\infty,[\underline{\lambda},\bar{\lambda}]}\\ &=&\frac{1}{|\widetilde{T_i}(0)|}=\frac{1}{\left|T_i\left(\frac{0-\frac{\bar{\lambda}-\underline{\lambda}}{2}}{\frac{\bar{\lambda}-\underline{\lambda}}{2}}\right)\right|}\\ &=&\frac{1}{\left| T_i\left( \frac{\frac{\bar{\lambda}}{\underline{\lambda}}+1}{\frac{\bar{\lambda}}{\underline{\lambda}}-1} \right) \right|}=\frac{1}{\left| T_i\left( \frac{\alpha+1}{\alpha-1} \right) \right|}\\ &&\leq\approx2\left( \frac{\sqrt{\alpha}-1}{\sqrt{\alpha}+1} \right)^i \end{aligned}
∣∣e(0)∣∣∣∣e(i)∣∣≤==∣∣(1−τi−1λ)...(1−τ0λ)∣∣∞,[λ,λˉ]∣Ti
(0)∣1=∣∣∣∣Ti(2λˉ−λ0−2λˉ−λ)∣∣∣∣1∣∣∣∣Ti(λλˉ−1λλˉ+1)∣∣∣∣1=∣∣Ti(α−1α+1)∣∣1≤≈2(α
+1α
−1)i
其中
α
=
λ
ˉ
λ
‾
>
1
\alpha=\frac{\bar{\lambda}}{\underline{\lambda}}>1
α=λλˉ>1。
而如果做
i
i
i次单步的定常迭代,每次单步的收敛速度为
P
1
∗
(
λ
)
=
1
∣
T
1
(
α
+
1
α
−
1
)
∣
=
1
∣
α
+
1
α
−
1
∣
=
α
−
1
α
+
1
P_1^*(\lambda)=\frac{1}{\left| T_1\left( \frac{\alpha+1}{\alpha-1} \right) \right|} = \frac{1}{\left| \frac{\alpha+1}{\alpha-1} \right|} = \frac{\alpha-1}{\alpha+1}
P1∗(λ)=∣∣T1(α−1α+1)∣∣1=∣∣α−1α+1∣∣1=α+1α−1
连续做
i
i
i次以后的收敛速度为
∣
∣
e
(
i
)
∣
∣
∣
∣
e
(
0
)
∣
∣
≤
(
α
−
1
α
+
1
)
i
\frac{||\bm{e}^{(i)}||}{||\bm{e}^{(0)||}}\leq\left( \frac{\alpha-1}{\alpha+1}\right)^i
∣∣e(0)∣∣∣∣e(i)∣∣≤(α+1α−1)i。这显然要比Chebyshev迭代的收敛速度慢。
两步迭代格式
单步的Chebyshev迭代格式的缺点是 τ j ( i ) \tau_j^{(i)} τj(i)依赖于迭代次数 i i i的取值。自然地想,能不能不依赖于后者的确定?这就可以用上Chebyshev多项式的三项递推关系来解决!
对标准的Chebyshev多项式,
T
n
+
1
(
y
)
=
2
y
T
n
(
y
)
−
T
n
−
1
(
y
)
,
.
.
.
,
T
1
(
y
)
=
y
,
T
0
(
y
)
=
1
T_{n+1}(y)=2yT_n(y)-T_{n-1}(y),...,T_1(y)=y,T_0(y)=1
Tn+1(y)=2yTn(y)−Tn−1(y),...,T1(y)=y,T0(y)=1。但需要的是定义在
[
λ
‾
,
λ
ˉ
]
[\underline{\lambda}, \bar{\lambda}]
[λ,λˉ]上的Chebyshev多项式,做变换:
T
i
~
(
x
)
=
T
i
(
y
=
x
−
1
2
(
λ
ˉ
+
λ
‾
)
1
2
(
λ
ˉ
−
λ
‾
)
)
=
T
i
(
y
=
a
x
−
y
0
)
\widetilde{T_i}(x)=T_i\left(y=\frac{x-\frac{1}{2}(\bar{\lambda}+\underline{\lambda})}{\frac{1}{2}(\bar{\lambda}-\underline{\lambda})}\right)=T_i(y=ax-y_0)
Ti
(x)=Ti(y=21(λˉ−λ)x−21(λˉ+λ))=Ti(y=ax−y0)
其中记
y
0
=
λ
ˉ
+
λ
‾
λ
ˉ
−
λ
‾
,
a
=
2
λ
ˉ
−
λ
‾
y_0=\frac{\bar{\lambda}+\underline{\lambda}}{\bar{\lambda}-\underline{\lambda}}, a=\frac{2}{\bar{\lambda}-\underline{\lambda}}
y0=λˉ−λλˉ+λ,a=λˉ−λ2。按照
x
=
0
x=0
x=0这一点的
T
i
~
\widetilde{T_i}
Ti
值做归一化:
T
i
^
(
x
)
=
T
i
~
(
x
)
T
i
~
(
0
)
=
T
i
(
a
x
−
y
0
)
T
i
(
−
y
0
)
=
T
i
(
y
0
−
a
x
)
T
i
(
y
0
)
\widehat{T_i}(x)=\frac{\widetilde{T_i}(x)}{\widetilde{T_i}(0)}=\frac{T_i(ax-y_0)}{T_i(-y_0)}=\frac{T_i(y_0-ax)}{T_i(y_0)}
Ti
(x)=Ti
(0)Ti
(x)=Ti(−y0)Ti(ax−y0)=Ti(y0)Ti(y0−ax),因此三项递推关系为
T
0
^
(
x
)
=
T
0
(
y
0
−
a
x
)
T
0
(
y
0
)
=
1
T
1
^
(
x
)
=
T
1
(
y
0
−
a
x
)
T
1
(
y
0
)
=
y
0
−
a
x
y
0
=
T
0
^
(
x
)
−
a
y
0
x
T
0
^
(
x
)
.
.
.
.
.
.
T
i
+
1
^
(
x
)
=
T
i
+
1
(
y
0
−
a
x
)
T
i
+
1
(
y
0
)
=
2
(
y
0
−
a
x
)
T
i
(
y
0
−
a
x
)
−
T
i
−
1
(
y
0
−
a
x
)
T
i
+
1
(
y
0
)
=
2
(
y
0
−
a
x
)
T
i
(
y
0
)
T
i
+
1
(
y
0
)
T
i
(
y
0
−
a
x
)
T
i
(
y
0
)
−
T
i
−
1
(
y
0
)
T
i
+
1
(
y
0
)
T
i
−
1
(
y
0
−
a
x
)
T
i
−
1
(
y
0
)
=
2
(
y
0
−
a
x
)
T
i
(
y
0
)
T
i
+
1
(
y
0
)
T
i
^
(
x
)
−
T
i
−
1
(
y
0
)
T
i
+
1
(
y
0
)
T
i
−
1
^
(
x
)
=
2
y
0
T
i
(
y
0
)
T
i
+
1
(
y
0
)
T
i
^
(
x
)
−
T
i
−
1
(
y
0
)
T
i
+
1
(
y
0
)
T
i
−
1
^
(
x
)
−
2
a
T
i
(
y
0
)
T
i
+
1
(
y
0
)
x
T
i
^
(
x
)
=
(
1
+
T
i
−
1
(
y
0
)
T
i
+
1
(
y
0
)
)
T
i
^
(
x
)
−
T
i
−
1
(
y
0
)
T
i
+
1
(
y
0
)
T
i
−
1
^
(
x
)
−
2
a
T
i
(
y
0
)
T
i
+
1
(
y
0
)
x
T
i
^
(
x
)
=
α
i
T
i
^
(
x
)
+
(
1
−
α
i
)
T
i
−
1
^
(
x
)
−
β
i
x
T
i
^
(
x
)
\widehat{T_0}(x)=\frac{T_0(y_0-ax)}{T_0(y_0)}=1\\ \widehat{T_1}(x)=\frac{T_1(y_0-ax)}{T_1(y_0)}=\frac{y_0-ax}{y_0}=\widehat{T_0}(x)-\frac{a}{y_0}x\widehat{T_0}(x)\\ ... ...\\ \begin{aligned} \widehat{T_{i+1}}(x)&=\frac{T_{i+1}(y_0-ax)}{T_{i+1}(y_0)}=\frac{2(y_0-ax)T_i(y_0-ax)-T_{i-1}(y_0-ax)}{T_{i+1}(y_0)}\\ &=\frac{2(y_0-ax)T_i(y_0)}{T_{i+1}(y_0)} \frac{T_i(y_0-ax)}{T_i(y_0)}-\frac{T_{i-1}(y_0)}{T_{i+1}(y_0)} \frac{T_{i-1}(y_0-ax)}{T_{i-1}(y_0)}\\ &=\frac{2(y_0-ax)T_i(y_0)}{T_{i+1}(y_0)} \widehat{T_i}(x) -\frac{T_{i-1}(y_0)}{T_{i+1}(y_0)} \widehat{T_{i-1}}(x)\\ &=\frac{2y_0T_i(y_0)}{T_{i+1}(y_0)} \widehat{T_i}(x) -\frac{T_{i-1}(y_0)}{T_{i+1}(y_0)} \widehat{T_{i-1}}(x)-2a\frac{T_i(y_0)}{T_{i+1}(y_0)}x\widehat{T_i}(x)\\ &=\left( 1+\frac{T_{i-1}(y_0)}{T_{i+1}(y_0)} \right) \widehat{T_i}(x) - \frac{T_{i-1}(y_0)}{T_{i+1}(y_0)}\widehat{T_{i-1}}(x) -2a\frac{T_i(y_0)}{T_{i+1}(y_0)}x\widehat{T_i}(x)\\ &=\alpha_i\widehat{T_i}(x) + (1-\alpha_i)\widehat{T_{i-1}}(x) - \beta_ix\widehat{T_i}(x) \end{aligned}
T0
(x)=T0(y0)T0(y0−ax)=1T1
(x)=T1(y0)T1(y0−ax)=y0y0−ax=T0
(x)−y0axT0
(x)......Ti+1
(x)=Ti+1(y0)Ti+1(y0−ax)=Ti+1(y0)2(y0−ax)Ti(y0−ax)−Ti−1(y0−ax)=Ti+1(y0)2(y0−ax)Ti(y0)Ti(y0)Ti(y0−ax)−Ti+1(y0)Ti−1(y0)Ti−1(y0)Ti−1(y0−ax)=Ti+1(y0)2(y0−ax)Ti(y0)Ti
(x)−Ti+1(y0)Ti−1(y0)Ti−1
(x)=Ti+1(y0)2y0Ti(y0)Ti
(x)−Ti+1(y0)Ti−1(y0)Ti−1
(x)−2aTi+1(y0)Ti(y0)xTi
(x)=(1+Ti+1(y0)Ti−1(y0))Ti
(x)−Ti+1(y0)Ti−1(y0)Ti−1
(x)−2aTi+1(y0)Ti(y0)xTi
(x)=αiTi
(x)+(1−αi)Ti−1
(x)−βixTi
(x)
其中系数
α
i
\alpha_i
αi和
β
i
\beta_i
βi的递归计算方式为
β
0
=
2
a
T
0
(
y
0
)
T
1
(
y
0
)
=
2
a
y
0
=
4
λ
ˉ
+
λ
‾
1
β
i
=
1
2
a
T
i
+
1
(
y
0
)
T
i
(
y
0
)
=
λ
ˉ
−
λ
‾
4
2
y
0
T
i
(
y
0
)
−
T
i
−
1
(
y
0
)
T
i
(
y
0
)
=
λ
ˉ
−
λ
‾
2
y
0
−
λ
ˉ
−
λ
‾
4
T
i
−
1
(
y
0
)
T
i
(
y
0
)
=
λ
ˉ
+
λ
‾
2
−
λ
ˉ
−
λ
‾
4
1
2
a
β
i
−
1
=
λ
ˉ
+
λ
‾
2
−
(
λ
ˉ
−
λ
‾
4
)
2
β
i
−
1
α
i
=
2
y
0
T
i
(
y
0
)
T
i
+
1
(
y
0
)
=
2
y
0
2
a
β
i
=
λ
ˉ
+
λ
‾
2
β
i
\beta_0=2a\frac{T_0(y_0)}{T_1(y_0)}=\frac{2a}{y_0}=\frac{4}{\bar{\lambda}+\underline{\lambda}}\\ \begin{aligned} \frac{1}{\beta_i}&=\frac{1}{2a}\frac{T_{i+1}(y_0)}{T_i(y_0)}=\frac{\bar{\lambda}-\underline{\lambda}}{4} \frac{2y_0T_i(y_0)-T_{i-1}(y_0)}{T_i(y_0)}=\frac{\bar{\lambda}-\underline{\lambda}}{2} y_0 - \frac{\bar{\lambda}-\underline{\lambda}}{4} \frac{T_{i-1}(y_0)}{T_i(y_0)}\\ &=\frac{\bar{\lambda}+\underline{\lambda}}{2} - \frac{\bar{\lambda}-\underline{\lambda}}{4} \frac{1}{2a} \beta_{i-1} = \frac{\bar{\lambda}+\underline{\lambda}}{2} - \left(\frac{\bar{\lambda} - \underline{\lambda}}{4} \right)^2 \beta_{i-1} \end{aligned} \\ \alpha_i=2y_0 \frac{T_i(y_0)}{T_{i+1}(y_0)}=\frac{2y_0}{2a}\beta_i=\frac{\bar{\lambda}+\underline{\lambda}}{2} \beta_i
β0=2aT1(y0)T0(y0)=y02a=λˉ+λ4βi1=2a1Ti(y0)Ti+1(y0)=4λˉ−λTi(y0)2y0Ti(y0)−Ti−1(y0)=2λˉ−λy0−4λˉ−λTi(y0)Ti−1(y0)=2λˉ+λ−4λˉ−λ2a1βi−1=2λˉ+λ−(4λˉ−λ)2βi−1αi=2y0Ti+1(y0)Ti(y0)=2a2y0βi=2λˉ+λβi
由此对应的两步Chebyshev迭代格式(具体的导出步骤,笔者也不确定)为:
x
(
i
+
1
)
=
α
i
x
(
i
)
+
(
1
−
α
i
)
x
(
i
−
1
)
+
β
i
r
(
i
)
\bm{x}^{(i+1)}=\alpha_i\bm{x}^{(i)} + (1-\alpha_i)\bm{x}^{(i-1)} + \beta_i\bm{r}^{(i)}
x(i+1)=αix(i)+(1−αi)x(i−1)+βir(i)
共轭梯度(Conjugate Gradient)方法
将原来的代数方程组求解问题
A
x
=
b
\bm{Ax}=\bm{b}
Ax=b转化为全空间无约束的优化问题,目标函数为
φ
=
1
2
(
A
x
,
x
)
−
(
b
,
x
)
\varphi=\frac{1}{2}(\bm{Ax},\bm{x})-(\bm{b}, \bm{x})
φ=21(Ax,x)−(b,x)
它的驻点
x
∗
\bm{x^*}
x∗满足
δ
φ
(
x
∗
)
=
A
x
∗
−
b
=
0
\delta\varphi(\bm{x^*})=\bm{Ax^*}-\bm{b}=0
δφ(x∗)=Ax∗−b=0,即为原方程组的解。目标函数有下界
φ
(
x
∗
)
=
φ
(
A
−
1
b
)
=
−
1
2
(
b
,
A
−
1
b
)
=
−
1
2
(
A
x
∗
,
x
∗
)
φ
(
x
)
−
φ
(
x
∗
)
=
1
2
(
A
(
x
−
x
∗
)
,
x
−
x
∗
)
=
1
2
∣
∣
x
−
x
∗
∣
∣
A
≥
0
\varphi(\bm{x^*})=\varphi(\bm{A}^{-1}\bm{b})=-\frac{1}{2}(\bm{b},\bm{A^{-1}b})=-\frac{1}{2}(\bm{Ax^*},\bm{x^*})\\ \varphi(\bm{x})-\varphi(\bm{x^*})=\frac{1}{2}\left(\bm{A}(\bm{x}-\bm{x^*}), \bm{x}-\bm{x^*}\right)=\frac{1}{2}||\bm{x}-\bm{x^*}||_{\bm{A}} \geq0
φ(x∗)=φ(A−1b)=−21(b,A−1b)=−21(Ax∗,x∗)φ(x)−φ(x∗)=21(A(x−x∗),x−x∗)=21∣∣x−x∗∣∣A≥0
其中由
A
\bm{A}
A内积诱导的
A
\bm{A}
A范数
∣
∣
x
∣
∣
A
2
=
(
A
x
,
x
)
=
(
x
,
x
)
A
||\bm{x}||_{\bm{A}}^2=(\bm{Ax},\bm{x})=(\bm{x},\bm{x})_{\bm{A}}
∣∣x∣∣A2=(Ax,x)=(x,x)A
CG和最速下降都属于子空间搜索方法。如果给定搜索方向
p
(
k
)
\bm{p}^{(k)}
p(k),对此做一维极小搜索,则可以得到该方向的步长
α
k
\alpha_k
αk:
α
k
=
arg min
α
φ
(
x
(
k
)
+
α
p
(
k
)
)
\alpha_k={\underset {\alpha} {\operatorname {arg\,min} }} \space \varphi({\bm{x}^{(k)}}+\alpha\bm{p}^{(k)})
αk=αargmin φ(x(k)+αp(k))
同之前的迭代法,误差和残量之间关系为 e ( k ) = x ( k ) − x ∗ = − A − 1 r ( k ) \bm{e}^{(k)}=\bm{x}^{(k)}-\bm{x^*}=-\bm{A}^{-1}\bm{r}^{(k)} e(k)=x(k)−x∗=−A−1r(k)。
最速下降法的收敛速度
简单地取
p
(
k
)
=
r
(
k
)
\bm{p}^{(k)}=\bm{r}^{(k)}
p(k)=r(k),即当前的残量方向。
∣
∣
e
(
k
+
1
)
∣
∣
A
2
=
∣
∣
e
(
k
)
+
α
k
r
(
k
)
∣
∣
A
2
=
(
A
e
(
k
)
+
α
k
A
r
(
k
)
,
e
(
k
)
+
α
k
r
(
k
)
)
=
∣
∣
e
(
k
)
∣
∣
A
2
+
∣
∣
α
k
r
(
k
)
∣
∣
A
2
−
2
(
r
(
k
)
,
r
(
k
)
)
=
∣
∣
e
(
k
)
∣
∣
A
2
−
α
k
(
r
(
k
)
,
r
(
k
)
)
=
∣
∣
e
(
k
)
∣
∣
A
2
−
(
r
(
k
)
,
r
(
k
)
)
(
r
(
k
)
,
r
(
k
)
)
(
r
(
k
)
,
r
(
k
)
)
A
=
∣
∣
e
(
k
)
∣
∣
A
2
(
1
−
1
(
r
(
k
)
,
r
(
k
)
)
A
(
r
(
k
)
,
r
(
k
)
)
(
r
(
k
)
,
r
(
k
)
)
A
−
1
(
r
(
k
)
,
r
(
k
)
)
)
≤
∣
∣
e
(
k
)
∣
∣
A
2
(
1
−
1
(
λ
1
+
λ
n
)
2
4
λ
1
λ
n
)
=
∣
∣
e
(
k
)
∣
∣
A
2
(
λ
1
−
λ
n
)
2
(
λ
1
+
λ
n
)
2
\begin{aligned} ||\bm{e}^{(k+1)}||_{\bm{A}}^2&=||\bm{e}^{(k)}+\alpha_k\bm{r}^{(k)}||_{\bm{A}}^2=\left(\bm{Ae^{(k)}}+\alpha_k\bm{A}\bm{r}^{(k)}, \bm{e}^{(k)}+\alpha_k\bm{r}^{(k)}\right)\\ &=||\bm{e}^{(k)}||_{\bm{A}}^2+||\alpha_k\bm{r}^{(k)}||_{\bm{A}}^2 - 2(\bm{r}^{(k)}, \bm{r}^{(k)})\\ &=||\bm{e}^{(k)}||_{\bm{A}}^2-\alpha_k(\bm{r}^{(k)}, \bm{r}^{(k)})=||\bm{e}^{(k)}||_{\bm{A}}^2-\frac{(\bm{r}^{(k)},\bm{r}^{(k)})(\bm{r}^{(k)},\bm{r}^{(k)})}{{(\bm{r}^{(k)},\bm{r}^{(k)})}_{\bm{A}}}\\ &=||\bm{e}^{(k)}||_{\bm{A}}^2\left( 1 - \frac{1}{ \frac{(\bm{r}^{(k)},\bm{r}^{(k)})_{\bm{A}}}{(\bm{r}^{(k)},\bm{r}^{(k)})} \frac{(\bm{r}^{(k)},\bm{r}^{(k)})_{\bm{A}^{-1}}}{(\bm{r}^{(k)},\bm{r}^{(k)})} } \right)\\ &\leq ||\bm{e}^{(k)}||_{\bm{A}}^2\left( 1 - \frac{1}{ \frac{(\lambda_1+\lambda_n)^2}{4\lambda_1\lambda_n} } \right)=||\bm{e}^{(k)}||_{\bm{A}}^2 \frac{(\lambda_1-\lambda_n)^2}{(\lambda_1+\lambda_n)^2} \end{aligned}
∣∣e(k+1)∣∣A2=∣∣e(k)+αkr(k)∣∣A2=(Ae(k)+αkAr(k),e(k)+αkr(k))=∣∣e(k)∣∣A2+∣∣αkr(k)∣∣A2−2(r(k),r(k))=∣∣e(k)∣∣A2−αk(r(k),r(k))=∣∣e(k)∣∣A2−(r(k),r(k))A(r(k),r(k))(r(k),r(k))=∣∣e(k)∣∣A2⎝⎛1−(r(k),r(k))(r(k),r(k))A(r(k),r(k))(r(k),r(k))A−11⎠⎞≤∣∣e(k)∣∣A2(1−4λ1λn(λ1+λn)21)=∣∣e(k)∣∣A2(λ1+λn)2(λ1−λn)2
所以收敛速度为
∣
∣
e
(
k
)
∣
∣
A
∣
∣
e
(
0
)
∣
∣
A
≤
(
λ
1
−
λ
n
λ
1
+
λ
n
)
k
\frac{||\bm{e}^{(k)}||_{\bm{A}}}{||\bm{e}^{(0)}||_{\bm{A}}} \leq \left(\frac{\lambda_1-\lambda_n}{\lambda_1+\lambda_n}\right)^k
∣∣e(0)∣∣A∣∣e(k)∣∣A≤(λ1+λnλ1−λn)k
CG方法的收敛速度
既然最速下降法是naive地选取局部残量方向作为搜索方向,自然的想法是能否同时优化2个目标(方向 p ( k ) \bm{p}^{(k)} p(k)和步长 α k \alpha_k αk)从而得到一些特殊的性质/好处。CG法希望在确定 p ( k ) \bm{p}^{(k)} p(k)方向之后所进行的一维极小搜索得到的新解 x ( k + 1 ) = x ( k ) + α k p ( k ) \bm{x}^{(k+1)}=\bm{x}^{(k)}+\alpha_k\bm{p}^{(k)} x(k+1)=x(k)+αkp(k),同时也是加入向量 p ( k ) \bm{p}^{(k)} p(k)后张成的Krylov子空间内的最小值。为达到这个好的性质所做的一些特别的推导在此不提,后续有时间再补上。我们只要知道第 k k k步得到的解 x ( k ) \bm{x}^{(k)} x(k)就是Krylov子空间 K k ( r ( 0 ) ) \bm{K_k}(\bm{r}^{(0)}) Kk(r(0))内的最小值,并根据此得出它的收敛速度。
取 x ( 0 ) = 0 \bm{x}^{(0)}=\bm{0} x(0)=0,则 r ( 0 ) = b \bm{r}^{(0)}=\bm{b} r(0)=b,令 y = A − 1 r ( 0 ) \bm{y}=\bm{A}^{-1}\bm{r}^{(0)} y=A−1r(0)误差之比
∣ ∣ e ( k ) ∣ ∣ A ∣ ∣ e ( 0 ) ∣ ∣ A = min P k − 1 ∈ P k − 1 ∣ ∣ P k − 1 ( A ) r ( 0 ) − A − 1 r ( 0 ) ∣ ∣ A ∣ ∣ A − 1 r ( 0 ) ∣ ∣ A ≤ min P k − 1 ∈ P k − 1 sup y ≠ 0 ∣ ∣ P k − 1 ( A ) A y − y ∣ ∣ A ∣ ∣ y ∣ ∣ A = min P k ∈ P k , P k ( 0 ) = 1 sup y ≠ 0 ∣ ∣ P k ( A ) y ∣ ∣ A ∣ ∣ y ∣ ∣ A \begin{aligned} \frac{||\bm{e}^{(k)}||_{\bm{A}}}{||\bm{e}^{(0)}||_{\bm{A}}}&= {\underset {P_{k-1}\in\mathbf{P_{k-1}}} {\operatorname {min} }} \frac{||P_{k-1}(\bm{A})\bm{r}^{(0)}-\bm{A}^{-1}\bm{r}^{(0)}||_{\bm{A}}}{||\bm{A}^{-1}\bm{r}^{(0)}||_{\bm{A}}}\\ &\leq {\underset {P_{k-1}\in\mathbf{P_{k-1}}} {\operatorname {min} }} {\underset {\bm{y}\neq\bm{0}} {\operatorname {sup} }} \frac{||P_{k-1}(\bm{A})\bm{Ay}-\bm{y}||_{\bm{A}}}{||\bm{y}||_{\bm{A}}}\\ &={\underset {P_{k}\in\mathbf{P_{k}}, P_k(0)=1} {\operatorname {min} }} {\underset {\bm{y}\neq\bm{0}} {\operatorname {sup} }} \frac{||P_{k}(\bm{A})\bm{y}||_{\bm{A}}}{||\bm{y}||_{\bm{A}}}\\ \end{aligned} ∣∣e(0)∣∣A∣∣e(k)∣∣A=Pk−1∈Pk−1min∣∣A−1r(0)∣∣A∣∣Pk−1(A)r(0)−A−1r(0)∣∣A≤Pk−1∈Pk−1miny=0sup∣∣y∣∣A∣∣Pk−1(A)Ay−y∣∣A=Pk∈Pk,Pk(0)=1miny=0sup∣∣y∣∣A∣∣Pk(A)y∣∣A
设 ( λ i , y i ) (\lambda_i, \bm{y}_i) (λi,yi)为 P k ( A ) P_k(\bm{A}) Pk(A)的特征值-向量对,则 y \bm{y} y在这组基底下表示为 y = ∑ i a i y i \bm{y}=\sum_ia_i\bm{y}_i y=∑iaiyi,上式化为
∣ ∣ e ( k ) ∣ ∣ A ∣ ∣ e ( 0 ) ∣ ∣ A ≤ min P k ∈ P k , P k ( 0 ) = 1 sup y ≠ 0 ∣ ∣ ∑ i P k ( λ i ) a i y i ∣ ∣ A ∣ ∣ ∑ i a i y i ∣ ∣ A = min P k ∈ P k , P k ( 0 ) = 1 max λ ∈ σ ( A ) ∣ P k ( λ ) ∣ ≤ min P k ∈ P k , P k ( 0 ) = 1 ∣ ∣ P k ( λ ) ∣ ∣ L ∞ ( λ n λ 1 , 1 ) \begin{aligned} \frac{||\bm{e}^{(k)}||_{\bm{A}}}{||\bm{e}^{(0)}||_{\bm{A}}} &\leq {\underset {P_{k}\in\mathbf{P_{k}}, P_k(0)=1} {\operatorname {min} }} {\underset {\bm{y}\neq\bm{0}} {\operatorname {sup} }} \frac{||\sum_iP_{k}(\lambda_i)a_i\bm{y}_i||_{\bm{A}}} { ||\sum_ia_i\bm{y}_i||_{\bm{A}} } \\ &= {\underset {P_{k}\in\mathbf{P_{k}}, P_k(0)=1} {\operatorname {min} }} {\underset {\lambda\in\sigma(\bm{A})} {\operatorname {max} }} | P_k(\lambda)| \\ &\leq {\underset {P_{k}\in\mathbf{P_{k}}, P_k(0)=1} {\operatorname {min} }} || P_k(\lambda) ||_{L\,\infty\,\left(\frac{\lambda_n}{\lambda_1},1\right)} \\ \end{aligned} ∣∣e(0)∣∣A∣∣e(k)∣∣A≤Pk∈Pk,Pk(0)=1miny=0sup∣∣∑iaiyi∣∣A∣∣∑iPk(λi)aiyi∣∣A=Pk∈Pk,Pk(0)=1minλ∈σ(A)max∣Pk(λ)∣≤Pk∈Pk,Pk(0)=1min∣∣Pk(λ)∣∣L∞(λ1λn,1)
该值即为定义在
[
λ
n
λ
1
,
1
]
[\frac{\lambda_n}{\lambda_1},1]
[λ1λn,1]上且满足
0
0
0点取值为
1
1
1的所有
k
k
k次多项式中,无穷范数最小的那个。这就是Chebyshev多项式,记
α
=
λ
n
λ
1
\alpha=\frac{\lambda_n}{\lambda_1}
α=λ1λn:
T
k
^
(
x
)
=
T
k
~
(
x
)
T
k
~
(
0
)
=
T
k
(
x
−
1
2
(
1
+
α
)
1
2
(
1
−
α
)
)
T
k
(
−
1
+
α
1
−
α
)
=
P
k
∗
(
x
)
\widehat{T_k}(x)=\frac{\widetilde{T_k}(x)}{\widetilde{T_k}(0)}=\frac{T_k\left(\frac{x-\frac{1}{2}(1+\alpha)}{\frac{1}{2}(1-\alpha)} \right)}{T_k\left(-\frac{1+\alpha}{1-\alpha} \right)}=P_k^*(x)
Tk
(x)=Tk
(0)Tk
(x)=Tk(−1−α1+α)Tk(21(1−α)x−21(1+α))=Pk∗(x)
利用其逼近性质,可知收敛速度为
∣
∣
e
(
k
)
∣
∣
A
∣
∣
e
(
0
)
∣
∣
A
≤
∣
∣
P
k
∗
(
x
)
∣
∣
L
∞
(
α
,
1
)
=
1
∣
T
k
(
1
+
α
1
−
α
)
∣
=
2
(
1
−
α
1
+
α
)
k
\frac{||\bm{e}^{(k)}||_{\bm{A}}}{||\bm{e}^{(0)}||_{\bm{A}}} \leq ||P_k^*(x)||_{L\,\infty\,\left(\alpha,1\right)}=\frac{1}{\left| T_k\left( \frac{1+\alpha}{1-\alpha} \right) \right|}=2\left( \frac{1-\sqrt{\alpha}}{1+\sqrt{\alpha}} \right)^k
∣∣e(0)∣∣A∣∣e(k)∣∣A≤∣∣Pk∗(x)∣∣L∞(α,1)=∣∣Tk(1−α1+α)∣∣1=2(1+α
1−α
)k
由于 0 < α < 1 0<\alpha<1 0<α<1,故 α \sqrt{\alpha} α 比 α \alpha α更靠近1,从而CG法有比最速下降更快的收敛速度。