漫谈:Chebyshev多项式,Krylov子空间,Chebyshev迭代,共轭梯度方法

漫谈:Chebyshev多项式,Krylov子空间,Chebyshev迭代,共轭梯度方法

Chebyshev迭代和共轭梯度方法的收敛速度(后者或称误差分析)都与Chebyshev多项式有着紧密联系,因此做一些整理,以期把其中的逻辑理清,推导理顺。只覆盖关键要点,不求面面俱到。

Chebyshev 多项式

标准Chebyshev多项式

与权函数 ρ ( y ) = 1 1 − y 2 \rho(y)=\frac{1}{\sqrt{1-y^2}} ρ(y)=1−y2 ​1​对应的正交多项式为Chebyshev多项式。标准的Chebyshev多项式分两段定义:
T n ( y ) = { c o s ( n   arccos ⁡ ( y ) )   , y ∈ [ − 1 , 1 ] 1 2 [ ( y + y 2 − 1 ) n + ( y − y 2 − 1 ) n ]   , y ∈ R ∖ [ − 1 , 1 ] T_n(y)=\left\{ \begin{aligned} &\mathrm{cos}(n\space\arccos(y)) &\space, y\in [-1,1]\\ &\frac{1}{2}\left[\left(y+\sqrt{y^2-1}\right)^n+\left(y-\sqrt{y^2-1}\right)^n\right] &\space, y \in \mathbf{R} \setminus [-1,1] \end{aligned} \right. Tn​(y)=⎩⎨⎧​​cos(n arccos(y))21​[(y+y2−1 ​)n+(y−y2−1 ​)n]​ ,y∈[−1,1] ,y∈R∖[−1,1]​

概述一些基本性质。 n n n次的Chebyshev多项式在开区间 ( − 1 , 1 ) (-1,1) (−1,1)内有 n n n个零点,在 [ − 1 , 1 ] [-1,1] [−1,1]内有 n + 1 n+1 n+1个符号交错的极值点( − 1 -1 −1和 + 1 +1 +1交替)。Chebyshev多项式不是奇函数就是偶函数,且随 n n n交替变化。

对于任意位置区间 [ a , b ] [a,b] [a,b]的Chebyshev多项式,都可以通过平移和缩放来化为标准区间 [ − 1 , 1 ] [-1,1] [−1,1]上的标准Chebyshev多项式。对于 x ∈ [ a , b ] x\in[a,b] x∈[a,b],做变换 y = x − 1 2 ( b + a ) 1 2 ( b − a ) y=\frac{x-\frac{1}{2}(b+a)}{\frac{1}{2}(b-a)} y=21​(b−a)x−21​(b+a)​,即有 y ∈ [ − 1 , 1 ] y\in[-1,1] y∈[−1,1]:
T n ~ ( x ) = T n ( y = x − 1 2 ( b + a ) 1 2 ( b − a ) ) \widetilde{T_n}(x)=T_n\left(y=\frac{x-\frac{1}{2}(b+a)}{\frac{1}{2}(b-a)}\right) Tn​ ​(x)=Tn​(y=21​(b−a)x−21​(b+a)​)
相应地,权函数也做变换:
ρ ~ ( x ) = ρ ( y = x − 1 2 ( b + a ) 1 2 ( b − a ) ) = 1 1 − ( x − 1 2 ( b + a ) 1 2 ( b − a ) ) 2 \widetilde{\rho}(x)=\rho\left(y=\frac{x-\frac{1}{2}(b+a)}{\frac{1}{2}(b-a)}\right)= \frac{1}{\sqrt{1-\left( \frac{x-\frac{1}{2}(b+a)}{\frac{1}{2}(b-a)} \right)^2}} ρ ​(x)=ρ(y=21​(b−a)x−21​(b+a)​)=1−(21​(b−a)x−21​(b+a)​)2 ​1​

三项递推关系

记 θ = a r c c o s y \theta=\mathrm{arccos}y θ=arccosy,利用和差化积公式有
cos ⁡ ( ( n + 1 ) θ ) + cos ⁡ ( ( n − 1 ) θ ) = 2 cos ⁡ ( n θ ) cos ⁡ ( θ ) \cos((n+1)\theta)+\cos((n-1)\theta)=2\cos(n\theta)\cos(\theta) cos((n+1)θ)+cos((n−1)θ)=2cos(nθ)cos(θ)
故有
T n + 1 ( y ) = 2 y T n ( y ) − T n − 1 ( y ) T_{n+1}(y)=2yT_n(y)-T_{n-1}(y) Tn+1​(y)=2yTn​(y)−Tn−1​(y)

逼近性质

对之前提到的定义在 x ∈ [ a , b ] x\in[a,b] x∈[a,b]上的Chebyshev多项式,按照区间外一点 c ∉ [ a , b ] c\notin[a,b] c∈/​[a,b]来做归一化:
T n ^ ( x ) = T n ~ ( x ) T n ~ ( c ) \widehat{T_n}(x)=\frac{\widetilde{T_n}(x)}{\widetilde{T_n}(c)} Tn​ ​(x)=Tn​ ​(c)Tn​ ​(x)​
显然有 T n ^ ( x ) ∈ Φ n , c = { f ∈ P   ∣   f ( c ) = 1 } \widehat{T_n}(x)\in\Phi_{n,c}=\{f\in\mathbf{P}\space|\space f(c)=1 \} Tn​ ​(x)∈Φn,c​={f∈P ∣ f(c)=1}。后者表示所有在 c c c这一点值为1的多项式的集合。Chebyshev多项式具有如下的逼近性质。
∣ ∣ T n ^ ( x ) ∣ ∣ C [ a , b ] = inf ⁡ f ∈ Φ n , c ∣ ∣ f ∣ ∣ C [ a , b ] ||\widehat{T_n}(x)||_{C[a,b]}={\underset {f\in\Phi_{n,c}}{\operatorname {inf} }} ||f||_{C[a,b]} ∣∣Tn​ ​(x)∣∣C[a,b]​=f∈Φn,c​inf​∣∣f∣∣C[a,b]​
所有在 c c c这一点值为1的多项式里,Chebyshev多项式是取到最小范数的那一个。由有限维空间内范数的等价性,从无穷范数入手证明。

首先说明 T n ^ ( x ) \widehat{T_n}(x) Tn​ ​(x)是well-defined的。因为 T n ( y ) T_n(y) Tn​(y)的零点都在开区间 ( − 1 , 1 ) (-1,1) (−1,1)内,因此平移和缩放后 T n ~ ( x ) \widetilde{T_n}(x) Tn​ ​(x)的零点都在 [ a , b ] [a,b] [a,b]内, T n ~ ( c ) ≠ 0 \widetilde{T_n}(c)\neq0 Tn​ ​(c)​=0。

反证法。假设 Y n ∈ Φ n , c Y_n\in\Phi_{n,c} Yn​∈Φn,c​,且满足 ∣ ∣ Y n ∣ ∣ C [ a , b ] < ∣ ∣ T n ^ ∣ ∣ C [ a , b ] ||Y_n||_{C[a,b]}<||\widehat{T_n}||_{C[a,b]} ∣∣Yn​∣∣C[a,b]​<∣∣Tn​ ​∣∣C[a,b]​。则对于 z n ( x ) = T n ^ ( x ) − Y n ( x ) z_n(x)=\widehat{T_n}(x)-Y_n(x) zn​(x)=Tn​ ​(x)−Yn​(x),它有一个零点是 c c c。

对于标准Chebyshev多项式 T n y T_n{y} Tn​y,在 [ − 1 , 1 ] [-1,1] [−1,1]上的(交错)极值点为 cos ⁡ ( k π n ) , k = 0 , 1 , . . . , n \cos(\frac{k\pi}{n}), k=0,1,...,n cos(nkπ​),k=0,1,...,n。所以 T n ~ ( x ) \widetilde{T_n}(x) Tn​ ​(x)在 [ a , b ] [a,b] [a,b]上的(交错)极值点为 x k = b + a 2 + b − a 2 cos ⁡ ( k π n ) , k = 0 , 1 , . . . , n x_k=\frac{b+a}{2}+\frac{b-a}{2}\cos(\frac{k\pi}{n}), k=0,1,...,n xk​=2b+a​+2b−a​cos(nkπ​),k=0,1,...,n。因为 ∣ ∣ Y n ( x ) ∣ ∣ C [ a , b ] < ∣ ∣ T n ^ ( x ) ∣ ∣ C [ a , b ] ||Y_n(x)||_{C[a,b]}<||\widehat{T_n}(x)||_{C[a,b]} ∣∣Yn​(x)∣∣C[a,b]​<∣∣Tn​ ​(x)∣∣C[a,b]​,那么自然有
∣ Y n ( x k ) ∣ ≤ ∣ ∣ Y n ( x ) ∣ ∣ C [ a , b ] < ∣ ∣ T n ^ ( x ) ∣ ∣ C [ a , b ] = ∣ T n ^ ( x k ) ∣   , k = 0 , 1 , . . . , n |Y_n(x_k)|\leq||Y_n(x)||_{C[a,b]}<||\widehat{T_n}(x)||_{C[a,b]}=|\widehat{T_n}(x_k)|\space, k=0,1,...,n ∣Yn​(xk​)∣≤∣∣Yn​(x)∣∣C[a,b]​<∣∣Tn​ ​(x)∣∣C[a,b]​=∣Tn​ ​(xk​)∣ ,k=0,1,...,n
所以 T n ^ ( x k ) \widehat{T_n}(x_k) Tn​ ​(xk​)与 T n ^ ( x k ) − Y n ( x k ) \widehat{T_n}(x_k)-Y_n(x_k) Tn​ ​(xk​)−Yn​(xk​)同号!而 { x k } \{x_k\} {xk​}是 T n ^ ( x ) \widehat{T_n}(x) Tn​ ​(x)的一组符号交错点,那么也是 T n ^ ( x ) − Y n ( x ) \widehat{T_n}(x)-Y_n(x) Tn​ ​(x)−Yn​(x)的一组符号交错点。所以在开区间 ( x k , x k + 1 ) (x_k,x_{k+1}) (xk​,xk+1​)中必至少有 z n ( x ) z_n(x) zn​(x)的一个零点,则 ( a , b ) (a,b) (a,b)内至少有 z n ( x ) z_n(x) zn​(x)的 n n n个不同零点。

加上前述的 c ∉ [ a , b ] c\notin[a,b] c∈/​[a,b]也是 z n ( x ) z_n(x) zn​(x)的一个零点,故它至少有 n + 1 n+1 n+1个零点,这与它是 n n n次多项式矛盾。证毕。

再来估计 ∣ ∣ T n ^ ( x ) ∣ ∣ C [ a , b ] = inf ⁡ f ∈ Φ n , c ∣ ∣ f ∣ ∣ C [ a , b ] ||\widehat{T_n}(x)||_{C[a,b]}={\underset {f\in\Phi_{n,c}}{\operatorname {inf} }} ||f||_{C[a,b]} ∣∣Tn​ ​(x)∣∣C[a,b]​=f∈Φn,c​inf​∣∣f∣∣C[a,b]​的大小。
∣ ∣ T n ^ ( x ) ∣ ∣ C [ a , b ] = ∣ ∣ T n ~ ( x ) T n ~ ( c ) ∣ ∣ C [ a , b ] = ∣ ∣ T n ~ ( x ) ∣ ∣ C [ a , b ] ∣ T n ~ ( c ) ∣ = 1 ∣ T n ~ ( c ) ∣ ||\widehat{T_n}(x)||_{C[a,b]}=\left|\left|\frac{\widetilde{T_n}(x)}{\widetilde{T_n}(c)}\right|\right|_{C[a,b]}= \frac{||\widetilde{T_n}(x)||_{C[a,b]}}{|\widetilde{T_n}(c)|}= \frac{1}{|\widetilde{T_n}(c)|} ∣∣Tn​ ​(x)∣∣C[a,b]​=∣∣∣∣∣​∣∣∣∣∣​Tn​ ​(c)Tn​ ​(x)​∣∣∣∣∣​∣∣∣∣∣​C[a,b]​=∣Tn​ ​(c)∣∣∣Tn​ ​(x)∣∣C[a,b]​​=∣Tn​ ​(c)∣1​

T n ~ ( c ) = T n ( c − b + a 2 b − a 2 ) = T n ( − ( b − c ) + ( a − c ) 2 ( b − c ) − ( a − c ) 2 ) = ( − 1 ) n T n ( ( b − c ) + ( a − c ) ( b − c ) − ( a − c ) ) \widetilde{T_n}(c)= T_n\left(\frac{c-\frac{b+a}{2}}{\frac{b-a}{2}} \right)= T_n\left(\frac{-\frac{(b-c)+(a-c)}{2}}{\frac{(b-c)-(a-c)}{2}} \right)= (-1)^nT_n\left( \frac{(b-c)+(a-c)}{(b-c)-(a-c)} \right) Tn​ ​(c)=Tn​(2b−a​c−2b+a​​)=Tn​(2(b−c)−(a−c)​−2(b−c)+(a−c)​​)=(−1)nTn​((b−c)−(a−c)(b−c)+(a−c)​)
所以
∣ T n ~ ( c ) ∣ = T n ( ∣ ( b − c ) + ( a − c ) ( b − c ) − ( a − c ) ∣ ) = T n ( λ + 1 λ − 1 ) |\widetilde{T_n}(c)|= T_n\left(\left| \frac{(b-c)+(a-c)}{(b-c)-(a-c)} \right|\right)= T_n\left(\frac{\lambda+1}{\lambda-1} \right) ∣Tn​ ​(c)∣=Tn​(∣∣∣∣​(b−c)−(a−c)(b−c)+(a−c)​∣∣∣∣​)=Tn​(λ−1λ+1​)
其中 λ = max ⁡ ( ∣ a − c ∣ ∣ b − c ∣ , ∣ b − c ∣ ∣ a − c ∣ ) > 1 \lambda=\max(\frac{|a-c|}{|b-c|},\frac{|b-c|}{|a-c|})>1 λ=max(∣b−c∣∣a−c∣​,∣a−c∣∣b−c∣​)>1。所以根据标准Chebyshev多项式在 [ − 1 , 1 ] [-1,1] [−1,1]区间外的表达式,有
T n ( λ + 1 λ − 1 ) ≥ 1 2 [ λ + 1 λ − 1 + ( λ + 1 λ − 1 ) 2 − 1 ] n = 1 2 ( λ + 1 λ − 1 ) n T_n\left(\frac{\lambda+1}{\lambda-1} \right)\geq \frac{1}{2}\left[ \frac{\lambda+1}{\lambda-1}+\sqrt{ \left(\frac{\lambda+1}{\lambda-1}\right)^2-1 } \right]^n= \frac{1}{2}\left(\frac{\sqrt{\lambda}+1}{\sqrt{\lambda}-1}\right)^n Tn​(λ−1λ+1​)≥21​⎣⎡​λ−1λ+1​+(λ−1λ+1​)2−1 ​⎦⎤​n=21​(λ ​−1λ ​+1​)n
所以最后得到的估计是
∣ ∣ T n ^ ( x ) ∣ ∣ C [ a , b ] = 1 ∣ T n ~ ( c ) ∣ = 1 T n ( λ + 1 λ − 1 ) ≤ 2 ( λ − 1 λ + 1 ) n ||\widehat{T_n}(x)||_{C[a,b]}= \frac{1}{|\widetilde{T_n}(c)|}= \frac{1}{T_n\left(\frac{\lambda+1}{\lambda-1} \right)}\leq 2\left(\frac{\sqrt{\lambda}-1}{\sqrt{\lambda}+1}\right)^n ∣∣Tn​ ​(x)∣∣C[a,b]​=∣Tn​ ​(c)∣1​=Tn​(λ−1λ+1​)1​≤2(λ ​+1λ ​−1​)n
并且这个估计是较紧凑的。将会在Chebyshev迭代和共轭梯度的收敛速度分析中用到这个结论。

Chebyshev迭代

Chebyshev迭代是线性非定常迭代,其迭代使用的矩阵/系数在每个迭代步都会发生变化。而任意的单步线性定常迭代都可以表述为最简单的Richardson迭代+预处理的组合。因此从Richardson迭代出发,引入Krylov子空间的概念,然后加以适当改造得到Chebyshev迭代。

Richardson迭代和Krylov子空间

Richardson迭代是最简单的迭代格式,它在每一步都做一个修正量,此修正量就简单地取为当前迭代结果的残差:
x ( i + 1 ) = x ( i ) + r ( i ) = x ( i ) + b − A x ( i ) = ( I − A ) x ( i ) + b \bm{x}^{(i+1)}=\bm{x}^{(i)}+\bm{r}^{(i)}=\bm{x}^{(i)}+\bm{b}-\bm{A}\bm{x}^{(i)}=(\bm{I}-\bm{A})\bm{x}^{(i)}+\bm{b} x(i+1)=x(i)+r(i)=x(i)+b−Ax(i)=(I−A)x(i)+b
r ( i + 1 ) = b − A x ( i + 1 ) = b − A x ( i ) − A r ( i ) = ( I − A ) r ( i ) = . . . = ( I − A ) i + 1 r ( 0 ) \bm{r}^{(i+1)}=\bm{b}-\bm{A}\bm{x}^{(i+1)}=\bm{b}-\bm{A}\bm{x}^{(i)}-\bm{A}\bm{r}^{(i)} =(\bm{I}-\bm{A})\bm{r}^{(i)}=...=(\bm{I}-\bm{A})^{i+1}\bm{r}^{(0)} r(i+1)=b−Ax(i+1)=b−Ax(i)−Ar(i)=(I−A)r(i)=...=(I−A)i+1r(0)
由残差与误差的关系有 e ( i ) = A − 1 r ( i ) \bm{e}^{(i)}=\bm{A}^{-1}\bm{r}^{(i)} e(i)=A−1r(i),可知相应有 e ( i + 1 ) = ( I − A ) e ( i ) \bm{e}^{(i+1)}=(\bm{I}-\bm{A})\bm{e}^{(i)} e(i+1)=(I−A)e(i)。于是在第 m m m步得到的迭代结果为
x ( m ) = x ( 0 ) + ∑ i = 0 m − 1 r ( i ) = x ( 0 ) + ∑ i = 0 m − 1 ( I − A ) i r ( 0 ) ∈ x ( 0 ) + K m ( r ( 0 ) ) \bm{x}^{(m)}=\bm{x}^{(0)}+\sum_{i=0}^{m-1}\bm{r}^{(i)}=\bm{x}^{(0)}+\sum_{i=0}^{m-1}(\bm{I}-\bm{A})^{i}\bm{r}^{(0)}\in\bm{x}^{(0)}+\bm{K_m}(\bm{r}^{(0)}) x(m)=x(0)+i=0∑m−1​r(i)=x(0)+i=0∑m−1​(I−A)ir(0)∈x(0)+Km​(r(0))
其中 K m ( r ( 0 ) ) \bm{K_m}(\bm{r}^{(0)}) Km​(r(0))是由 r ( 0 ) \bm{r}^{(0)} r(0)生成的Krylov子空间,其中的元素是用 A \bm{A} A反复作用于 r ( 0 ) \bm{r}^{(0)} r(0)得到的 r ( 0 ) , A r ( 0 ) , A 2 r ( 0 ) , . . . \bm{r}^{(0)},\bm{Ar}^{(0)},\bm{A^2r}^{(0)},... r(0),Ar(0),A2r(0),...等的线性组合。

任何的单步定常迭代格式,比如给定 A = M − N \bm{A}=\bm{M}-\bm{N} A=M−N,其中 M \bm{M} M为预处理矩阵,则迭代可以写成
x ( i + 1 ) = M − 1 N x ( i ) + M − 1 b = M − 1 ( M − A ) x ( i ) + M − 1 b \bm{x}^{(i+1)}=\bm{M}^{-1}\bm{N}\bm{x}^{(i)}+\bm{M}^{-1}\bm{b}=\bm{M}^{-1}(\bm{M}-\bm{A})\bm{x}^{(i)}+\bm{M}^{-1}\bm{b} x(i+1)=M−1Nx(i)+M−1b=M−1(M−A)x(i)+M−1b
而对 A x = b \bm{Ax}=\bm{b} Ax=b做预处理后得到的 M − 1 A x = M − 1 b \bm{M^{-1}Ax}=\bm{M^{-1}b} M−1Ax=M−1b,再进行Richardson迭代,可以得到
x ( i + 1 ) = x ( i ) + r ( i ) = x ( i ) + M − 1 b − M − 1 A x ( i ) \bm{x}^{(i+1)}=\bm{x}^{(i)}+\bm{r}^{(i)}=\bm{x}^{(i)}+\bm{M^{-1}b}-\bm{M^{-1}}\bm{A}\bm{x}^{(i)} x(i+1)=x(i)+r(i)=x(i)+M−1b−M−1Ax(i)
可见是等价的,所以任何的单步定常迭代形式都可以写成预处理后的Richardson迭代(即当前迭代值+修正量)的形式。

单步迭代格式

原始的Richardson迭代是最简单地取了 M = I \bm{M}=\bm{I} M=I(最好计算,但最不好近似)来做预处理。现在采用非定常,即考虑每步迭代不那么naive,而是选用一个与当前步 i i i有关地矩阵 M i \bm{M_i} Mi​做预处理,假设 M i = τ i M \bm{M_i}=\tau_i\bm{M} Mi​=τi​M由同一个矩阵 M \bm{M} M生成。同样类似Richardson迭代的好计算的想法,取 M = I \bm{M}=\bm{I} M=I,故有 M i = τ i I \bm{M_i}=\tau_i\bm{I} Mi​=τi​I。所以残量的递推关系成为
r ( i + 1 ) = ( I − τ i A ) r ( i ) = ( I − τ i A ) ( I − τ i − 1 A ) . . . ( I − τ 0 A ) r ( 0 ) \bm{r}^{(i+1)}=(\bm{I}-\tau_i\bm{A})\bm{r}^{(i)}=(\bm{I}-\tau_i\bm{A})(\bm{I}-\tau_{i-1}\bm{A})...(\bm{I}-\tau_0\bm{A})\bm{r}^{(0)} r(i+1)=(I−τi​A)r(i)=(I−τi​A)(I−τi−1​A)...(I−τ0​A)r(0)
对于误差 e ( i ) \bm{e}^{(i)} e(i)也同样有此关系,所以有
∣ ∣ e ( i ) ∣ ∣ ∣ ∣ e ( 0 ) ∣ ∣ ≤ ∣ ∣ ( I − τ i − 1 A ) . . . ( I − τ 0 A ) ∣ ∣ \frac{||\bm{e}^{(i)}||}{||\bm{e}^{(0)}||}\leq|| (\bm{I}-\tau_{i-1}\bm{A})...(\bm{I}-\tau_0\bm{A}) || ∣∣e(0)∣∣∣∣e(i)∣∣​≤∣∣(I−τi−1​A)...(I−τ0​A)∣∣
我们希望构造出来的结果(即 τ 0 , τ 1 , . . . \tau_0,\tau_1,... τ0​,τ1​,...等的取值)能使残量(误差)收缩得最小,即求解如下的优化问题:
inf ⁡ τ 0 , . . . , τ i − 1 ∣ ∣ ( I − τ i − 1 A ) . . . ( I − τ 0 A ) ∣ ∣ {\underset {\tau_0,...,\tau_{i-1}} {\operatorname {inf} }} || (\bm{I}-\tau_{i-1}\bm{A})...(\bm{I}-\tau_0\bm{A}) || τ0​,...,τi−1​inf​∣∣(I−τi−1​A)...(I−τ0​A)∣∣
假设 A \bm{A} A是对称阵(这些强加的适用条件后面汇总来看),那么存在正交阵 P \bm{P} P(如果 A \bm{A} A是Hermite矩阵,则 P \bm{P} P为酉矩阵)使得 A \bm{A} A能相似对角化,且取2-范数时有 ∣ ∣ P ∣ ∣ 2 = 1 ||\bm{P}||_2=1 ∣∣P∣∣2​=1。
A = P d i a g ( λ 1 , λ 2 , . . . , λ n ) P T = P Λ P T \bm{A}=\bm{P} \mathrm{diag}(\lambda_1,\lambda_2,...,\lambda_n)\bm{P}^T=\bm{P}\bm{\Lambda} \bm{P}^T A=Pdiag(λ1​,λ2​,...,λn​)PT=PΛPT
那么有
∣ ∣ ( I − τ i − 1 A ) . . . ( I − τ 0 A ) ∣ ∣ 2 = ∣ ∣ P ( I − τ i − 1 Λ ) . . . ( I − τ 0 Λ ) P T ∣ ∣ 2 = ∣ ∣ ( I − τ i − 1 Λ ) . . . ( I − τ 0 Λ ) ∣ ∣ 2 = max ⁡ λ ∈ σ ( A ) ∣ ( 1 − τ i − 1 λ ) . . . ( 1 − τ 0 λ ) ∣ \begin{aligned} || (\bm{I}-\tau_{i-1}\bm{A})...(\bm{I}-\tau_0\bm{A}) ||_2=& ||\bm{P}(\bm{I}-\tau_{i-1}\bm{\Lambda})...(\bm{I}-\tau_0\bm{\Lambda})\bm{P}^T||_2\\ =&||(\bm{I}-\tau_{i-1}\bm{\Lambda})...(\bm{I}-\tau_0\bm{\Lambda})||_2\\ =&{\underset {\lambda\in\sigma(\bm{A})} {\operatorname {max} }} | (1-\tau_{i-1}\lambda)...(1-\tau_0\lambda)| \\ \end{aligned} ∣∣(I−τi−1​A)...(I−τ0​A)∣∣2​===​∣∣P(I−τi−1​Λ)...(I−τ0​Λ)PT∣∣2​∣∣(I−τi−1​Λ)...(I−τ0​Λ)∣∣2​λ∈σ(A)max​∣(1−τi−1​λ)...(1−τ0​λ)∣​
而 A \bm{A} A的谱是可以估计的,假设已算出 σ ( A ) ∈ [ λ ‾ , λ ˉ ] \sigma(\bm{A})\in[\underline{\lambda},\bar{\lambda}] σ(A)∈[λ​,λˉ]。那么上式可以写成:
max ⁡ λ ∈ σ ( A ) ∣ ( 1 − τ i − 1 λ ) . . . ( 1 − τ 0 λ ) ∣ ≤ ∣ ∣ ( 1 − τ i − 1 λ ) . . . ( 1 − τ 0 λ ) ∣ ∣ ∞ , [ λ ‾ , λ ˉ ] = ∣ ∣ f ( λ ) ∣ ∣ ∞ , [ λ ‾ , λ ˉ ] \begin{aligned} {\underset {\lambda\in\sigma(\bm{A})} {\operatorname {max} }} | (1-\tau_{i-1}\lambda)...(1-\tau_0\lambda)|&\leq& ||(1-\tau_{i-1}\lambda)...(1-\tau_0\lambda)||_{\infty,[\underline{\lambda},\bar{\lambda}]}\\ &=&||f(\lambda)||_{\infty,[\underline{\lambda},\bar{\lambda}]}\\ \end{aligned} λ∈σ(A)max​∣(1−τi−1​λ)...(1−τ0​λ)∣​≤=​∣∣(1−τi−1​λ)...(1−τ0​λ)∣∣∞,[λ​,λˉ]​∣∣f(λ)∣∣∞,[λ​,λˉ]​​
左边的原式是一个绝对值,现在右边看成是一个关于 λ \lambda λ的多项式函数 f ( λ ) = ( 1 − τ i − 1 λ ) . . . ( 1 − τ 0 λ ) f(\lambda)=(1-\tau_{i-1}\lambda)...(1-\tau_0\lambda) f(λ)=(1−τi−1​λ)...(1−τ0​λ)的无穷范数。该多项式函数有 i i i个实单根: 1 τ 0 , . . . , 1 τ i − 1 \frac{1}{\tau_0},...,\frac{1}{\tau_{i-1}} τ0​1​,...,τi−1​1​。

而由之前的结论,定义在 [ λ ‾ , λ ˉ ] [\underline{\lambda},\bar{\lambda}] [λ​,λˉ]上的Chebyshev多项式 T i ^ ( λ ) \widehat{T_i}(\lambda) Ti​ ​(λ)在开区间 ( λ ‾ , λ ˉ ) (\underline{\lambda},\bar{\lambda}) (λ​,λˉ)内有 i i i个实单根,且 T i ^ ( λ ) = inf ⁡ P i ∈ P i , P i ( 0 ) = 1 ∣ ∣ P i ∣ ∣ ∞ , [ λ ‾ , λ ˉ ] \widehat{T_i}(\lambda)={\underset {P_i\in\mathbf{P_i},P_i(0)=1}{\operatorname {inf} }} ||P_i||_{\infty,[\underline{\lambda},\bar{\lambda}]} Ti​ ​(λ)=Pi​∈Pi​,Pi​(0)=1inf​∣∣Pi​∣∣∞,[λ​,λˉ]​.

注意当 A \bm{A} A为对称正定阵时, 0 ∉ [ λ ‾ , λ ˉ ] 0\notin[\underline{\lambda},\bar{\lambda}] 0∈/​[λ​,λˉ],所以 inf ⁡ P i ∈ P i , P i ( 0 ) = 1 ∣ ∣ P i ∣ ∣ ∞ , [ λ ‾ , λ ˉ ] {\underset {P_i\in\mathbf{P_i},P_i(0)=1}{\operatorname {inf} }} ||P_i||_{\infty,[\underline{\lambda},\bar{\lambda}]} Pi​∈Pi​,Pi​(0)=1inf​∣∣Pi​∣∣∞,[λ​,λˉ]​的下界恰好能被原问题的 f ( λ ) f(\lambda) f(λ)取到。此时
P i ∗ ( λ ) = T i ^ ( λ ) = T i ~ ( λ ) T i ~ ( 0 ) = T i ( λ − 1 2 ( λ ˉ + λ ‾ ) 1 2 ( λ ˉ − λ ‾ ) ) T i ( 0 − 1 2 ( λ ˉ + λ ‾ ) 1 2 ( λ ˉ − λ ‾ ) ) = ( 1 − τ i − 1 ( i ) λ ) . . . ( 1 − τ 0 ( i ) λ ) P_i^*(\lambda)=\widehat{T_i}(\lambda)=\frac{\widetilde{T_i}(\lambda)}{\widetilde{T_i}(0)} =\frac{T_i\left( \frac{\lambda-\frac{1}{2}(\bar{\lambda}+\underline{\lambda})} {\frac{1}{2} (\bar{\lambda}-\underline{\lambda})} \right)}{T_i\left( \frac{0-\frac{1}{2}(\bar{\lambda}+\underline{\lambda})} {\frac{1}{2} (\bar{\lambda}-\underline{\lambda})} \right)}= (1-\tau_{i-1}^{(i)}\lambda)...(1-\tau_{0}^{(i)}\lambda) Pi∗​(λ)=Ti​ ​(λ)=Ti​ ​(0)Ti​ ​(λ)​=Ti​(21​(λˉ−λ​)0−21​(λˉ+λ​)​)Ti​(21​(λˉ−λ​)λ−21​(λˉ+λ​)​)​=(1−τi−1(i)​λ)...(1−τ0(i)​λ)
其中 τ j ( i ) \tau_j^{(i)} τj(i)​的上标表示与迭代次数 i i i有关。所以原问题的 i i i个实单根就是定义在 [ λ ‾ , λ ˉ ] [\underline{\lambda},\bar{\lambda}] [λ​,λˉ]上的Chebyshev多项式的根。单步的迭代格式写成
x ( 1 ) = x ( 0 ) + τ 0 ( i ) r ( 0 ) . . . x ( j ) = x ( j − 1 ) + τ j − 1 ( i ) r ( j − 1 ) . . . x ( i ) = x ( i − 1 ) + τ i − 1 ( i ) r ( i − 1 ) \begin{aligned} \bm{x}^{(1)}&=&\bm{x}^{(0)}+\tau_0^{(i)}\bm{r}^{(0)}\\ &...&\\ \bm{x}^{(j)}&=&\bm{x}^{(j-1)}+\tau_{j-1}^{(i)}\bm{r}^{(j-1)}\\ &...&\\ \bm{x}^{(i)}&=&\bm{x}^{(i-1)}+\tau_{i-1}^{(i)}\bm{r}^{(i-1)}\\ \end{aligned} x(1)x(j)x(i)​=...=...=​x(0)+τ0(i)​r(0)x(j−1)+τj−1(i)​r(j−1)x(i−1)+τi−1(i)​r(i−1)​

收敛速度

Chebyshev迭代的收敛速度优势,应当通过进行一次 i i i步的Chebyshev迭代,和进行 i i i次的单步定常迭代来比较。

由Chebyshev的逼近性质,可以得到Chebyshev的迭代收敛速度为
∣ ∣ e ( i ) ∣ ∣ ∣ ∣ e ( 0 ) ∣ ∣ ≤ ∣ ∣ ( 1 − τ i − 1 λ ) . . . ( 1 − τ 0 λ ) ∣ ∣ ∞ , [ λ ‾ , λ ˉ ] = 1 ∣ T i ~ ( 0 ) ∣ = 1 ∣ T i ( 0 − λ ˉ − λ ‾ 2 λ ˉ − λ ‾ 2 ) ∣ = 1 ∣ T i ( λ ˉ λ ‾ + 1 λ ˉ λ ‾ − 1 ) ∣ = 1 ∣ T i ( α + 1 α − 1 ) ∣ ≤ ≈ 2 ( α − 1 α + 1 ) i \begin{aligned} \frac{||\bm{e}^{(i)}||}{||\bm{e}^{(0)||}}&\leq&||(1-\tau_{i-1}\lambda)...(1-\tau_0\lambda)||_{\infty,[\underline{\lambda},\bar{\lambda}]}\\ &=&\frac{1}{|\widetilde{T_i}(0)|}=\frac{1}{\left|T_i\left(\frac{0-\frac{\bar{\lambda}-\underline{\lambda}}{2}}{\frac{\bar{\lambda}-\underline{\lambda}}{2}}\right)\right|}\\ &=&\frac{1}{\left| T_i\left( \frac{\frac{\bar{\lambda}}{\underline{\lambda}}+1}{\frac{\bar{\lambda}}{\underline{\lambda}}-1} \right) \right|}=\frac{1}{\left| T_i\left( \frac{\alpha+1}{\alpha-1} \right) \right|}\\ &&\leq\approx2\left( \frac{\sqrt{\alpha}-1}{\sqrt{\alpha}+1} \right)^i \end{aligned} ∣∣e(0)∣∣∣∣e(i)∣∣​​≤==​∣∣(1−τi−1​λ)...(1−τ0​λ)∣∣∞,[λ​,λˉ]​∣Ti​ ​(0)∣1​=∣∣∣∣​Ti​(2λˉ−λ​​0−2λˉ−λ​​​)∣∣∣∣​1​∣∣∣∣​Ti​(λ​λˉ​−1λ​λˉ​+1​)∣∣∣∣​1​=∣∣​Ti​(α−1α+1​)∣∣​1​≤≈2(α ​+1α ​−1​)i​
其中 α = λ ˉ λ ‾ > 1 \alpha=\frac{\bar{\lambda}}{\underline{\lambda}}>1 α=λ​λˉ​>1。

而如果做 i i i次单步的定常迭代,每次单步的收敛速度为
P 1 ∗ ( λ ) = 1 ∣ T 1 ( α + 1 α − 1 ) ∣ = 1 ∣ α + 1 α − 1 ∣ = α − 1 α + 1 P_1^*(\lambda)=\frac{1}{\left| T_1\left( \frac{\alpha+1}{\alpha-1} \right) \right|} = \frac{1}{\left| \frac{\alpha+1}{\alpha-1} \right|} = \frac{\alpha-1}{\alpha+1} P1∗​(λ)=∣∣​T1​(α−1α+1​)∣∣​1​=∣∣​α−1α+1​∣∣​1​=α+1α−1​
连续做 i i i次以后的收敛速度为 ∣ ∣ e ( i ) ∣ ∣ ∣ ∣ e ( 0 ) ∣ ∣ ≤ ( α − 1 α + 1 ) i \frac{||\bm{e}^{(i)}||}{||\bm{e}^{(0)||}}\leq\left( \frac{\alpha-1}{\alpha+1}\right)^i ∣∣e(0)∣∣∣∣e(i)∣∣​≤(α+1α−1​)i。这显然要比Chebyshev迭代的收敛速度慢。

两步迭代格式

单步的Chebyshev迭代格式的缺点是 τ j ( i ) \tau_j^{(i)} τj(i)​依赖于迭代次数 i i i的取值。自然地想,能不能不依赖于后者的确定?这就可以用上Chebyshev多项式的三项递推关系来解决!

对标准的Chebyshev多项式, T n + 1 ( y ) = 2 y T n ( y ) − T n − 1 ( y ) , . . . , T 1 ( y ) = y , T 0 ( y ) = 1 T_{n+1}(y)=2yT_n(y)-T_{n-1}(y),...,T_1(y)=y,T_0(y)=1 Tn+1​(y)=2yTn​(y)−Tn−1​(y),...,T1​(y)=y,T0​(y)=1。但需要的是定义在 [ λ ‾ , λ ˉ ] [\underline{\lambda}, \bar{\lambda}] [λ​,λˉ]上的Chebyshev多项式,做变换:
T i ~ ( x ) = T i ( y = x − 1 2 ( λ ˉ + λ ‾ ) 1 2 ( λ ˉ − λ ‾ ) ) = T i ( y = a x − y 0 ) \widetilde{T_i}(x)=T_i\left(y=\frac{x-\frac{1}{2}(\bar{\lambda}+\underline{\lambda})}{\frac{1}{2}(\bar{\lambda}-\underline{\lambda})}\right)=T_i(y=ax-y_0) Ti​ ​(x)=Ti​(y=21​(λˉ−λ​)x−21​(λˉ+λ​)​)=Ti​(y=ax−y0​)
其中记 y 0 = λ ˉ + λ ‾ λ ˉ − λ ‾ , a = 2 λ ˉ − λ ‾ y_0=\frac{\bar{\lambda}+\underline{\lambda}}{\bar{\lambda}-\underline{\lambda}}, a=\frac{2}{\bar{\lambda}-\underline{\lambda}} y0​=λˉ−λ​λˉ+λ​​,a=λˉ−λ​2​。按照 x = 0 x=0 x=0这一点的 T i ~ \widetilde{T_i} Ti​ ​值做归一化: T i ^ ( x ) = T i ~ ( x ) T i ~ ( 0 ) = T i ( a x − y 0 ) T i ( − y 0 ) = T i ( y 0 − a x ) T i ( y 0 ) \widehat{T_i}(x)=\frac{\widetilde{T_i}(x)}{\widetilde{T_i}(0)}=\frac{T_i(ax-y_0)}{T_i(-y_0)}=\frac{T_i(y_0-ax)}{T_i(y_0)} Ti​ ​(x)=Ti​ ​(0)Ti​ ​(x)​=Ti​(−y0​)Ti​(ax−y0​)​=Ti​(y0​)Ti​(y0​−ax)​,因此三项递推关系为
T 0 ^ ( x ) = T 0 ( y 0 − a x ) T 0 ( y 0 ) = 1 T 1 ^ ( x ) = T 1 ( y 0 − a x ) T 1 ( y 0 ) = y 0 − a x y 0 = T 0 ^ ( x ) − a y 0 x T 0 ^ ( x ) . . . . . . T i + 1 ^ ( x ) = T i + 1 ( y 0 − a x ) T i + 1 ( y 0 ) = 2 ( y 0 − a x ) T i ( y 0 − a x ) − T i − 1 ( y 0 − a x ) T i + 1 ( y 0 ) = 2 ( y 0 − a x ) T i ( y 0 ) T i + 1 ( y 0 ) T i ( y 0 − a x ) T i ( y 0 ) − T i − 1 ( y 0 ) T i + 1 ( y 0 ) T i − 1 ( y 0 − a x ) T i − 1 ( y 0 ) = 2 ( y 0 − a x ) T i ( y 0 ) T i + 1 ( y 0 ) T i ^ ( x ) − T i − 1 ( y 0 ) T i + 1 ( y 0 ) T i − 1 ^ ( x ) = 2 y 0 T i ( y 0 ) T i + 1 ( y 0 ) T i ^ ( x ) − T i − 1 ( y 0 ) T i + 1 ( y 0 ) T i − 1 ^ ( x ) − 2 a T i ( y 0 ) T i + 1 ( y 0 ) x T i ^ ( x ) = ( 1 + T i − 1 ( y 0 ) T i + 1 ( y 0 ) ) T i ^ ( x ) − T i − 1 ( y 0 ) T i + 1 ( y 0 ) T i − 1 ^ ( x ) − 2 a T i ( y 0 ) T i + 1 ( y 0 ) x T i ^ ( x ) = α i T i ^ ( x ) + ( 1 − α i ) T i − 1 ^ ( x ) − β i x T i ^ ( x ) \widehat{T_0}(x)=\frac{T_0(y_0-ax)}{T_0(y_0)}=1\\ \widehat{T_1}(x)=\frac{T_1(y_0-ax)}{T_1(y_0)}=\frac{y_0-ax}{y_0}=\widehat{T_0}(x)-\frac{a}{y_0}x\widehat{T_0}(x)\\ ... ...\\ \begin{aligned} \widehat{T_{i+1}}(x)&=\frac{T_{i+1}(y_0-ax)}{T_{i+1}(y_0)}=\frac{2(y_0-ax)T_i(y_0-ax)-T_{i-1}(y_0-ax)}{T_{i+1}(y_0)}\\ &=\frac{2(y_0-ax)T_i(y_0)}{T_{i+1}(y_0)} \frac{T_i(y_0-ax)}{T_i(y_0)}-\frac{T_{i-1}(y_0)}{T_{i+1}(y_0)} \frac{T_{i-1}(y_0-ax)}{T_{i-1}(y_0)}\\ &=\frac{2(y_0-ax)T_i(y_0)}{T_{i+1}(y_0)} \widehat{T_i}(x) -\frac{T_{i-1}(y_0)}{T_{i+1}(y_0)} \widehat{T_{i-1}}(x)\\ &=\frac{2y_0T_i(y_0)}{T_{i+1}(y_0)} \widehat{T_i}(x) -\frac{T_{i-1}(y_0)}{T_{i+1}(y_0)} \widehat{T_{i-1}}(x)-2a\frac{T_i(y_0)}{T_{i+1}(y_0)}x\widehat{T_i}(x)\\ &=\left( 1+\frac{T_{i-1}(y_0)}{T_{i+1}(y_0)} \right) \widehat{T_i}(x) - \frac{T_{i-1}(y_0)}{T_{i+1}(y_0)}\widehat{T_{i-1}}(x) -2a\frac{T_i(y_0)}{T_{i+1}(y_0)}x\widehat{T_i}(x)\\ &=\alpha_i\widehat{T_i}(x) + (1-\alpha_i)\widehat{T_{i-1}}(x) - \beta_ix\widehat{T_i}(x) \end{aligned} T0​ ​(x)=T0​(y0​)T0​(y0​−ax)​=1T1​ ​(x)=T1​(y0​)T1​(y0​−ax)​=y0​y0​−ax​=T0​ ​(x)−y0​a​xT0​ ​(x)......Ti+1​ ​(x)​=Ti+1​(y0​)Ti+1​(y0​−ax)​=Ti+1​(y0​)2(y0​−ax)Ti​(y0​−ax)−Ti−1​(y0​−ax)​=Ti+1​(y0​)2(y0​−ax)Ti​(y0​)​Ti​(y0​)Ti​(y0​−ax)​−Ti+1​(y0​)Ti−1​(y0​)​Ti−1​(y0​)Ti−1​(y0​−ax)​=Ti+1​(y0​)2(y0​−ax)Ti​(y0​)​Ti​ ​(x)−Ti+1​(y0​)Ti−1​(y0​)​Ti−1​ ​(x)=Ti+1​(y0​)2y0​Ti​(y0​)​Ti​ ​(x)−Ti+1​(y0​)Ti−1​(y0​)​Ti−1​ ​(x)−2aTi+1​(y0​)Ti​(y0​)​xTi​ ​(x)=(1+Ti+1​(y0​)Ti−1​(y0​)​)Ti​ ​(x)−Ti+1​(y0​)Ti−1​(y0​)​Ti−1​ ​(x)−2aTi+1​(y0​)Ti​(y0​)​xTi​ ​(x)=αi​Ti​ ​(x)+(1−αi​)Ti−1​ ​(x)−βi​xTi​ ​(x)​
其中系数 α i \alpha_i αi​和 β i \beta_i βi​的递归计算方式为
β 0 = 2 a T 0 ( y 0 ) T 1 ( y 0 ) = 2 a y 0 = 4 λ ˉ + λ ‾ 1 β i = 1 2 a T i + 1 ( y 0 ) T i ( y 0 ) = λ ˉ − λ ‾ 4 2 y 0 T i ( y 0 ) − T i − 1 ( y 0 ) T i ( y 0 ) = λ ˉ − λ ‾ 2 y 0 − λ ˉ − λ ‾ 4 T i − 1 ( y 0 ) T i ( y 0 ) = λ ˉ + λ ‾ 2 − λ ˉ − λ ‾ 4 1 2 a β i − 1 = λ ˉ + λ ‾ 2 − ( λ ˉ − λ ‾ 4 ) 2 β i − 1 α i = 2 y 0 T i ( y 0 ) T i + 1 ( y 0 ) = 2 y 0 2 a β i = λ ˉ + λ ‾ 2 β i \beta_0=2a\frac{T_0(y_0)}{T_1(y_0)}=\frac{2a}{y_0}=\frac{4}{\bar{\lambda}+\underline{\lambda}}\\ \begin{aligned} \frac{1}{\beta_i}&=\frac{1}{2a}\frac{T_{i+1}(y_0)}{T_i(y_0)}=\frac{\bar{\lambda}-\underline{\lambda}}{4} \frac{2y_0T_i(y_0)-T_{i-1}(y_0)}{T_i(y_0)}=\frac{\bar{\lambda}-\underline{\lambda}}{2} y_0 - \frac{\bar{\lambda}-\underline{\lambda}}{4} \frac{T_{i-1}(y_0)}{T_i(y_0)}\\ &=\frac{\bar{\lambda}+\underline{\lambda}}{2} - \frac{\bar{\lambda}-\underline{\lambda}}{4} \frac{1}{2a} \beta_{i-1} = \frac{\bar{\lambda}+\underline{\lambda}}{2} - \left(\frac{\bar{\lambda} - \underline{\lambda}}{4} \right)^2 \beta_{i-1} \end{aligned} \\ \alpha_i=2y_0 \frac{T_i(y_0)}{T_{i+1}(y_0)}=\frac{2y_0}{2a}\beta_i=\frac{\bar{\lambda}+\underline{\lambda}}{2} \beta_i β0​=2aT1​(y0​)T0​(y0​)​=y0​2a​=λˉ+λ​4​βi​1​​=2a1​Ti​(y0​)Ti+1​(y0​)​=4λˉ−λ​​Ti​(y0​)2y0​Ti​(y0​)−Ti−1​(y0​)​=2λˉ−λ​​y0​−4λˉ−λ​​Ti​(y0​)Ti−1​(y0​)​=2λˉ+λ​​−4λˉ−λ​​2a1​βi−1​=2λˉ+λ​​−(4λˉ−λ​​)2βi−1​​αi​=2y0​Ti+1​(y0​)Ti​(y0​)​=2a2y0​​βi​=2λˉ+λ​​βi​

由此对应的两步Chebyshev迭代格式(具体的导出步骤,笔者也不确定)为:
x ( i + 1 ) = α i x ( i ) + ( 1 − α i ) x ( i − 1 ) + β i r ( i ) \bm{x}^{(i+1)}=\alpha_i\bm{x}^{(i)} + (1-\alpha_i)\bm{x}^{(i-1)} + \beta_i\bm{r}^{(i)} x(i+1)=αi​x(i)+(1−αi​)x(i−1)+βi​r(i)

共轭梯度(Conjugate Gradient)方法

将原来的代数方程组求解问题 A x = b \bm{Ax}=\bm{b} Ax=b转化为全空间无约束的优化问题,目标函数为
φ = 1 2 ( A x , x ) − ( b , x ) \varphi=\frac{1}{2}(\bm{Ax},\bm{x})-(\bm{b}, \bm{x}) φ=21​(Ax,x)−(b,x)
它的驻点 x ∗ \bm{x^*} x∗满足 δ φ ( x ∗ ) = A x ∗ − b = 0 \delta\varphi(\bm{x^*})=\bm{Ax^*}-\bm{b}=0 δφ(x∗)=Ax∗−b=0,即为原方程组的解。目标函数有下界
φ ( x ∗ ) = φ ( A − 1 b ) = − 1 2 ( b , A − 1 b ) = − 1 2 ( A x ∗ , x ∗ ) φ ( x ) − φ ( x ∗ ) = 1 2 ( A ( x − x ∗ ) , x − x ∗ ) = 1 2 ∣ ∣ x − x ∗ ∣ ∣ A ≥ 0 \varphi(\bm{x^*})=\varphi(\bm{A}^{-1}\bm{b})=-\frac{1}{2}(\bm{b},\bm{A^{-1}b})=-\frac{1}{2}(\bm{Ax^*},\bm{x^*})\\ \varphi(\bm{x})-\varphi(\bm{x^*})=\frac{1}{2}\left(\bm{A}(\bm{x}-\bm{x^*}), \bm{x}-\bm{x^*}\right)=\frac{1}{2}||\bm{x}-\bm{x^*}||_{\bm{A}} \geq0 φ(x∗)=φ(A−1b)=−21​(b,A−1b)=−21​(Ax∗,x∗)φ(x)−φ(x∗)=21​(A(x−x∗),x−x∗)=21​∣∣x−x∗∣∣A​≥0
其中由 A \bm{A} A内积诱导的 A \bm{A} A范数 ∣ ∣ x ∣ ∣ A 2 = ( A x , x ) = ( x , x ) A ||\bm{x}||_{\bm{A}}^2=(\bm{Ax},\bm{x})=(\bm{x},\bm{x})_{\bm{A}} ∣∣x∣∣A2​=(Ax,x)=(x,x)A​

CG和最速下降都属于子空间搜索方法。如果给定搜索方向 p ( k ) \bm{p}^{(k)} p(k),对此做一维极小搜索,则可以得到该方向的步长 α k \alpha_k αk​:
α k = arg min ⁡ α   φ ( x ( k ) + α p ( k ) ) \alpha_k={\underset {\alpha} {\operatorname {arg\,min} }} \space \varphi({\bm{x}^{(k)}}+\alpha\bm{p}^{(k)}) αk​=αargmin​ φ(x(k)+αp(k))

同之前的迭代法,误差和残量之间关系为 e ( k ) = x ( k ) − x ∗ = − A − 1 r ( k ) \bm{e}^{(k)}=\bm{x}^{(k)}-\bm{x^*}=-\bm{A}^{-1}\bm{r}^{(k)} e(k)=x(k)−x∗=−A−1r(k)。

最速下降法的收敛速度

简单地取 p ( k ) = r ( k ) \bm{p}^{(k)}=\bm{r}^{(k)} p(k)=r(k),即当前的残量方向。
∣ ∣ e ( k + 1 ) ∣ ∣ A 2 = ∣ ∣ e ( k ) + α k r ( k ) ∣ ∣ A 2 = ( A e ( k ) + α k A r ( k ) , e ( k ) + α k r ( k ) ) = ∣ ∣ e ( k ) ∣ ∣ A 2 + ∣ ∣ α k r ( k ) ∣ ∣ A 2 − 2 ( r ( k ) , r ( k ) ) = ∣ ∣ e ( k ) ∣ ∣ A 2 − α k ( r ( k ) , r ( k ) ) = ∣ ∣ e ( k ) ∣ ∣ A 2 − ( r ( k ) , r ( k ) ) ( r ( k ) , r ( k ) ) ( r ( k ) , r ( k ) ) A = ∣ ∣ e ( k ) ∣ ∣ A 2 ( 1 − 1 ( r ( k ) , r ( k ) ) A ( r ( k ) , r ( k ) ) ( r ( k ) , r ( k ) ) A − 1 ( r ( k ) , r ( k ) ) ) ≤ ∣ ∣ e ( k ) ∣ ∣ A 2 ( 1 − 1 ( λ 1 + λ n ) 2 4 λ 1 λ n ) = ∣ ∣ e ( k ) ∣ ∣ A 2 ( λ 1 − λ n ) 2 ( λ 1 + λ n ) 2 \begin{aligned} ||\bm{e}^{(k+1)}||_{\bm{A}}^2&=||\bm{e}^{(k)}+\alpha_k\bm{r}^{(k)}||_{\bm{A}}^2=\left(\bm{Ae^{(k)}}+\alpha_k\bm{A}\bm{r}^{(k)}, \bm{e}^{(k)}+\alpha_k\bm{r}^{(k)}\right)\\ &=||\bm{e}^{(k)}||_{\bm{A}}^2+||\alpha_k\bm{r}^{(k)}||_{\bm{A}}^2 - 2(\bm{r}^{(k)}, \bm{r}^{(k)})\\ &=||\bm{e}^{(k)}||_{\bm{A}}^2-\alpha_k(\bm{r}^{(k)}, \bm{r}^{(k)})=||\bm{e}^{(k)}||_{\bm{A}}^2-\frac{(\bm{r}^{(k)},\bm{r}^{(k)})(\bm{r}^{(k)},\bm{r}^{(k)})}{{(\bm{r}^{(k)},\bm{r}^{(k)})}_{\bm{A}}}\\ &=||\bm{e}^{(k)}||_{\bm{A}}^2\left( 1 - \frac{1}{ \frac{(\bm{r}^{(k)},\bm{r}^{(k)})_{\bm{A}}}{(\bm{r}^{(k)},\bm{r}^{(k)})} \frac{(\bm{r}^{(k)},\bm{r}^{(k)})_{\bm{A}^{-1}}}{(\bm{r}^{(k)},\bm{r}^{(k)})} } \right)\\ &\leq ||\bm{e}^{(k)}||_{\bm{A}}^2\left( 1 - \frac{1}{ \frac{(\lambda_1+\lambda_n)^2}{4\lambda_1\lambda_n} } \right)=||\bm{e}^{(k)}||_{\bm{A}}^2 \frac{(\lambda_1-\lambda_n)^2}{(\lambda_1+\lambda_n)^2} \end{aligned} ∣∣e(k+1)∣∣A2​​=∣∣e(k)+αk​r(k)∣∣A2​=(Ae(k)+αk​Ar(k),e(k)+αk​r(k))=∣∣e(k)∣∣A2​+∣∣αk​r(k)∣∣A2​−2(r(k),r(k))=∣∣e(k)∣∣A2​−αk​(r(k),r(k))=∣∣e(k)∣∣A2​−(r(k),r(k))A​(r(k),r(k))(r(k),r(k))​=∣∣e(k)∣∣A2​⎝⎛​1−(r(k),r(k))(r(k),r(k))A​​(r(k),r(k))(r(k),r(k))A−1​​1​⎠⎞​≤∣∣e(k)∣∣A2​(1−4λ1​λn​(λ1​+λn​)2​1​)=∣∣e(k)∣∣A2​(λ1​+λn​)2(λ1​−λn​)2​​

所以收敛速度为
∣ ∣ e ( k ) ∣ ∣ A ∣ ∣ e ( 0 ) ∣ ∣ A ≤ ( λ 1 − λ n λ 1 + λ n ) k \frac{||\bm{e}^{(k)}||_{\bm{A}}}{||\bm{e}^{(0)}||_{\bm{A}}} \leq \left(\frac{\lambda_1-\lambda_n}{\lambda_1+\lambda_n}\right)^k ∣∣e(0)∣∣A​∣∣e(k)∣∣A​​≤(λ1​+λn​λ1​−λn​​)k

CG方法的收敛速度

既然最速下降法是naive地选取局部残量方向作为搜索方向,自然的想法是能否同时优化2个目标(方向 p ( k ) \bm{p}^{(k)} p(k)和步长 α k \alpha_k αk​)从而得到一些特殊的性质/好处。CG法希望在确定 p ( k ) \bm{p}^{(k)} p(k)方向之后所进行的一维极小搜索得到的新解 x ( k + 1 ) = x ( k ) + α k p ( k ) \bm{x}^{(k+1)}=\bm{x}^{(k)}+\alpha_k\bm{p}^{(k)} x(k+1)=x(k)+αk​p(k),同时也是加入向量 p ( k ) \bm{p}^{(k)} p(k)后张成的Krylov子空间内的最小值。为达到这个好的性质所做的一些特别的推导在此不提,后续有时间再补上。我们只要知道第 k k k步得到的解 x ( k ) \bm{x}^{(k)} x(k)就是Krylov子空间 K k ( r ( 0 ) ) \bm{K_k}(\bm{r}^{(0)}) Kk​(r(0))内的最小值,并根据此得出它的收敛速度。

取 x ( 0 ) = 0 \bm{x}^{(0)}=\bm{0} x(0)=0,则 r ( 0 ) = b \bm{r}^{(0)}=\bm{b} r(0)=b,令 y = A − 1 r ( 0 ) \bm{y}=\bm{A}^{-1}\bm{r}^{(0)} y=A−1r(0)误差之比

∣ ∣ e ( k ) ∣ ∣ A ∣ ∣ e ( 0 ) ∣ ∣ A = min ⁡ P k − 1 ∈ P k − 1 ∣ ∣ P k − 1 ( A ) r ( 0 ) − A − 1 r ( 0 ) ∣ ∣ A ∣ ∣ A − 1 r ( 0 ) ∣ ∣ A ≤ min ⁡ P k − 1 ∈ P k − 1 sup ⁡ y ≠ 0 ∣ ∣ P k − 1 ( A ) A y − y ∣ ∣ A ∣ ∣ y ∣ ∣ A = min ⁡ P k ∈ P k , P k ( 0 ) = 1 sup ⁡ y ≠ 0 ∣ ∣ P k ( A ) y ∣ ∣ A ∣ ∣ y ∣ ∣ A \begin{aligned} \frac{||\bm{e}^{(k)}||_{\bm{A}}}{||\bm{e}^{(0)}||_{\bm{A}}}&= {\underset {P_{k-1}\in\mathbf{P_{k-1}}} {\operatorname {min} }} \frac{||P_{k-1}(\bm{A})\bm{r}^{(0)}-\bm{A}^{-1}\bm{r}^{(0)}||_{\bm{A}}}{||\bm{A}^{-1}\bm{r}^{(0)}||_{\bm{A}}}\\ &\leq {\underset {P_{k-1}\in\mathbf{P_{k-1}}} {\operatorname {min} }} {\underset {\bm{y}\neq\bm{0}} {\operatorname {sup} }} \frac{||P_{k-1}(\bm{A})\bm{Ay}-\bm{y}||_{\bm{A}}}{||\bm{y}||_{\bm{A}}}\\ &={\underset {P_{k}\in\mathbf{P_{k}}, P_k(0)=1} {\operatorname {min} }} {\underset {\bm{y}\neq\bm{0}} {\operatorname {sup} }} \frac{||P_{k}(\bm{A})\bm{y}||_{\bm{A}}}{||\bm{y}||_{\bm{A}}}\\ \end{aligned} ∣∣e(0)∣∣A​∣∣e(k)∣∣A​​​=Pk−1​∈Pk−1​min​∣∣A−1r(0)∣∣A​∣∣Pk−1​(A)r(0)−A−1r(0)∣∣A​​≤Pk−1​∈Pk−1​min​y​=0sup​∣∣y∣∣A​∣∣Pk−1​(A)Ay−y∣∣A​​=Pk​∈Pk​,Pk​(0)=1min​y​=0sup​∣∣y∣∣A​∣∣Pk​(A)y∣∣A​​​

设 ( λ i , y i ) (\lambda_i, \bm{y}_i) (λi​,yi​)为 P k ( A ) P_k(\bm{A}) Pk​(A)的特征值-向量对,则 y \bm{y} y在这组基底下表示为 y = ∑ i a i y i \bm{y}=\sum_ia_i\bm{y}_i y=∑i​ai​yi​,上式化为

∣ ∣ e ( k ) ∣ ∣ A ∣ ∣ e ( 0 ) ∣ ∣ A ≤ min ⁡ P k ∈ P k , P k ( 0 ) = 1 sup ⁡ y ≠ 0 ∣ ∣ ∑ i P k ( λ i ) a i y i ∣ ∣ A ∣ ∣ ∑ i a i y i ∣ ∣ A = min ⁡ P k ∈ P k , P k ( 0 ) = 1 max ⁡ λ ∈ σ ( A ) ∣ P k ( λ ) ∣ ≤ min ⁡ P k ∈ P k , P k ( 0 ) = 1 ∣ ∣ P k ( λ ) ∣ ∣ L   ∞   ( λ n λ 1 , 1 ) \begin{aligned} \frac{||\bm{e}^{(k)}||_{\bm{A}}}{||\bm{e}^{(0)}||_{\bm{A}}} &\leq {\underset {P_{k}\in\mathbf{P_{k}}, P_k(0)=1} {\operatorname {min} }} {\underset {\bm{y}\neq\bm{0}} {\operatorname {sup} }} \frac{||\sum_iP_{k}(\lambda_i)a_i\bm{y}_i||_{\bm{A}}} { ||\sum_ia_i\bm{y}_i||_{\bm{A}} } \\ &= {\underset {P_{k}\in\mathbf{P_{k}}, P_k(0)=1} {\operatorname {min} }} {\underset {\lambda\in\sigma(\bm{A})} {\operatorname {max} }} | P_k(\lambda)| \\ &\leq {\underset {P_{k}\in\mathbf{P_{k}}, P_k(0)=1} {\operatorname {min} }} || P_k(\lambda) ||_{L\,\infty\,\left(\frac{\lambda_n}{\lambda_1},1\right)} \\ \end{aligned} ∣∣e(0)∣∣A​∣∣e(k)∣∣A​​​≤Pk​∈Pk​,Pk​(0)=1min​y​=0sup​∣∣∑i​ai​yi​∣∣A​∣∣∑i​Pk​(λi​)ai​yi​∣∣A​​=Pk​∈Pk​,Pk​(0)=1min​λ∈σ(A)max​∣Pk​(λ)∣≤Pk​∈Pk​,Pk​(0)=1min​∣∣Pk​(λ)∣∣L∞(λ1​λn​​,1)​​

该值即为定义在 [ λ n λ 1 , 1 ] [\frac{\lambda_n}{\lambda_1},1] [λ1​λn​​,1]上且满足 0 0 0点取值为 1 1 1的所有 k k k次多项式中,无穷范数最小的那个。这就是Chebyshev多项式,记 α = λ n λ 1 \alpha=\frac{\lambda_n}{\lambda_1} α=λ1​λn​​:
T k ^ ( x ) = T k ~ ( x ) T k ~ ( 0 ) = T k ( x − 1 2 ( 1 + α ) 1 2 ( 1 − α ) ) T k ( − 1 + α 1 − α ) = P k ∗ ( x ) \widehat{T_k}(x)=\frac{\widetilde{T_k}(x)}{\widetilde{T_k}(0)}=\frac{T_k\left(\frac{x-\frac{1}{2}(1+\alpha)}{\frac{1}{2}(1-\alpha)} \right)}{T_k\left(-\frac{1+\alpha}{1-\alpha} \right)}=P_k^*(x) Tk​ ​(x)=Tk​ ​(0)Tk​ ​(x)​=Tk​(−1−α1+α​)Tk​(21​(1−α)x−21​(1+α)​)​=Pk∗​(x)

利用其逼近性质,可知收敛速度为
∣ ∣ e ( k ) ∣ ∣ A ∣ ∣ e ( 0 ) ∣ ∣ A ≤ ∣ ∣ P k ∗ ( x ) ∣ ∣ L   ∞   ( α , 1 ) = 1 ∣ T k ( 1 + α 1 − α ) ∣ = 2 ( 1 − α 1 + α ) k \frac{||\bm{e}^{(k)}||_{\bm{A}}}{||\bm{e}^{(0)}||_{\bm{A}}} \leq ||P_k^*(x)||_{L\,\infty\,\left(\alpha,1\right)}=\frac{1}{\left| T_k\left( \frac{1+\alpha}{1-\alpha} \right) \right|}=2\left( \frac{1-\sqrt{\alpha}}{1+\sqrt{\alpha}} \right)^k ∣∣e(0)∣∣A​∣∣e(k)∣∣A​​≤∣∣Pk∗​(x)∣∣L∞(α,1)​=∣∣​Tk​(1−α1+α​)∣∣​1​=2(1+α ​1−α ​​)k

由于 0 < α < 1 0<\alpha<1 0<α<1,故 α \sqrt{\alpha} α ​比 α \alpha α更靠近1,从而CG法有比最速下降更快的收敛速度。

上一篇:多任务学习中的数据分布问题(一)


下一篇:Ubuntu20.04+ROS Noetic下编译Ti mmWave Ros Driver出错