最优化学习笔记1——关于拟牛顿法推导

从2020年3月份到现在,一年的时间里断断续续自学最优化,结合豆瓣读书、知乎等网站上的推荐,翻阅了以下书和课程:

  • Stephen Boyd和Lieven Vandenberghe的《Convex Optimization》,这本书被公认为学习凸优化的必读经典著作,其中大量的篇幅讲应用,讲理论和算法的篇幅稍少一些。在B网站上有Boyd的视频课程,油管上也有,不过只有英文字幕,如果是零基础接触,直接看Boyd的视频稍显费劲,B站上也有凌青老师以《Convex Optimization》为教材的讲解,推荐作为入门观看;
  • Jorge Nocedal和Stephen J. Wright的《Numerical Optimization》,这本书更多地介绍最优化算法,在算法层面和《Convex Optimization》正好形成互补;
  • B站上崔雪婷老师的最优化课程(貌似是因为疫情原因录制的),综合了《Convex Optimization》、《Numerical Optimization》等教材的内容,推导非常详细,推荐观看,崔老师自己也写了一本书《最优化基础理论与方法》,书很薄,适合快速入门(貌似B站上的视频已经被删掉了)。

关于最优化的书和课程非常多,其他书和课程包括但不限于:

  • Dimitri P. Bertsekas的《Nonlinear Programming》(作者比较苦口婆心,内容详尽)
  • Yurii Nesterov的《Lectures On Convex Optimization》,被誉为思路很巧妙,不过目前没有细读,所以也没有体会到
  • Aharon Ben-Taly和Arkadi Nemirovski的《Lectures On Modern Convex Optimization》,主要作为课程的配套讲义
  • B站上方述诚老师的非线性规划课(只有讲义,没有对应的教材,方述诚老师另有关于线性锥优化方面的专著)
  • David G. Luenberger和Yinyu Ye的《Linear and Nonlinear Programming》
  • Mokhtar S. Bazaraa,Hanif D. Sherali和C. M. Shetty的《Nonlinear Programming Theory and Algorithms》
  • 袁亚湘和孙文瑜的《最优化理论与方法》,毕竟是院士的著作,国产为数不多的关于最优化的好书

侧重于应用方面的有:

  • Andreas Antoniou和Wu-Sheng Lu的《Practical Optimization: Algorithms and Engineering Applications》,侧重介绍具有工程实用性的算法
  • Lorenz T. Biegler的《Nonlinear programming: Concepts, Algorithms, and Applications to Chemical Processes》,虽然书中是以化工过程的优化举例,但是也同样适用于其他动态优化(或者最优控制)问题

值得一提的是,最优化和控制学科里面的最优控制联系紧密,是最优控制中直接法的数学理论基石,除了上述Lorenz T. Biegler的著作涉及最优控制外,侧重于最优控制求解的还有:

  • John T. Betts的《Practical methods for optimal control and estimation using nonlinear programming》
  • S´ebastien Gros和Moritz Diehl的《Numerical Optimal Control》(与课程对应的讲义,*可以看到视频,可惜没有配套字幕)。

个人觉得,上述这些著作全部看完不必要也不现实,应该先选择1至2本入门,然后根据自己的研究需要选读。

扯完前面的废话,谈谈拟牛顿法,主要参考《Numerical Optimization》中的内容,根据第3章的描述,拟牛顿法最开始主要用于克服牛顿法计算无约束优化问题中的搜索方向的不足(有无约束和拟牛顿的思想没有太大关系)。以线搜索为例(对于信赖域法类似),我们想要
min ⁡ x f ( x ) (1) \min_x f(x)\tag{1} xmin​f(x)(1)

的方式是通过下述迭代过程:
x k + 1 = x k + α k p k (2) x_{k+1}=x_k+\alpha _kp_k\tag{2} xk+1​=xk​+αk​pk​(2)

式中: α k \alpha _k αk​为步长, p k p_k pk​为搜索方向。

这里不考虑步长 α k \alpha _k αk​的选取,下面主要考虑搜索方向 p k p_k pk​的选取。首先看牛顿法,需要求 f ( x ) f(x) f(x)对应的Hessian矩阵 ∇ 2 f \nabla ^2f ∇2f,给出的搜索方向为
p k = − ∇ 2 f k − 1 ∇ f k (3) p_k=-\nabla ^2f_{k}^{-1}\nabla f_k\tag{3} pk​=−∇2fk−1​∇fk​(3)

式中: ∇ 2 f k − 1 \nabla ^2f_{k}^{-1} ∇2fk−1​为 ∇ 2 f k \nabla ^2f_k ∇2fk​的逆矩阵,考虑到 ∇ 2 f k − 1 \nabla ^2f_{k}^{-1} ∇2fk−1​并不容易计算,拟牛顿法提出用下式代替牛顿法的搜索方向:
p k = − B k − 1 ∇ f k (4) p_k=-B_{k}^{-1}\nabla f_k\tag{4} pk​=−Bk−1​∇fk​(4)

式中: B k B_k Bk​一般选为对称正定矩阵,若能够直接获得 B k B_k Bk​的计算方式,那么就免去了求Hessian矩阵的繁琐工作。那么如何得到 B k B_k Bk​呢?

1. 拟牛顿法中的割线方程推导

考虑到牛顿法其实可以从对目标函数的二阶近似获得,即考虑在当前解 x k x_k xk​附近对目标函数的二阶近似:
m k ( p ) = f k + ∇ f k T p + 1 2 p T ∇ 2 f k p (5) m_k(p)=f_k+\nabla f_{k}^{\mathrm{T}}p+\frac{1}{2}p^{\mathrm{T}}\nabla ^2f_kp\tag{5} mk​(p)=fk​+∇fkT​p+21​pT∇2fk​p(5)
式中: p = x − x k p=x-x_k p=x−xk​。选择 p p p使得式(5)中的目标函数取极值,就得到了牛顿法的公式,那么一个自然的想法是,式(4)可看作是使得下述目标函数取极值:
m k ( p ) = f k + ∇ f k T p + 1 2 p T B k p (6) m_k(p)=f_k+\nabla f_{k}^{\mathrm{T}}p+\frac{1}{2}p^{\mathrm{T}}B_kp\tag{6} mk​(p)=fk​+∇fkT​p+21​pTBk​p(6)

即用对称正定矩阵 B k B_k Bk​代替Hessian矩阵 ∇ 2 f k \nabla ^2f_k ∇2fk​。由于直接获得 B k B_k Bk​的表达式不太容易,转而考虑迭代计算 B k B_k Bk​,即需要推导 B k + 1 B_{k+1} Bk+1​和 B k B_k Bk​的关系,根据式(6),到了 x k + 1 x_{k+1} xk+1​附近时,有
m k + 1 ( p ) = f k + 1 + ∇ f k + 1 T p + 1 2 p T B k + 1 p (7) m_{k+1}(p)=f_{k+1}+\nabla f_{k+1}^{\mathrm{T}}p+\frac{1}{2}p^{\mathrm{T}}B_{k+1}p\tag{7} mk+1​(p)=fk+1​+∇fk+1T​p+21​pTBk+1​p(7)

式中: p = x − x k + 1 p=x-x_{k+1} p=x−xk+1​。根据式(7),若 p = 0 p=0 p=0,即停留在 x k + 1 x_{k+1} xk+1​时,有 ∇ m k + 1 ( 0 ) = ∇ f k + 1 \nabla m_{k+1}(0)=\nabla f_{k+1} ∇mk+1​(0)=∇fk+1​,于是想到后退到 x k x_k xk​(对应 p = − α k p k p=-\alpha _kp_k p=−αk​pk​)时, m k + 1 m_{k+1} mk+1​的梯度应与 ∇ f k \nabla f_k ∇fk​相等(即 m k + 1 m_{k+1} mk+1​应能保证从 x k x_k xk​到 x k + 1 x_{k+1} xk+1​的过程中对 ∇ f \nabla f ∇f足够近似),从而有
∇ m k + 1 ( − α k p k ) = ∇ f k + 1 − α k B k + 1 p k = ∇ f k (8) \nabla m_{k+1}(-\alpha _kp_k)=\nabla f_{k+1}-\alpha _kB_{k+1}p_k=\nabla f_k\tag{8} ∇mk+1​(−αk​pk​)=∇fk+1​−αk​Bk+1​pk​=∇fk​(8)
也即
B k + 1 α k p k = ∇ f k + 1 − ∇ f k (9) B_{k+1}\alpha _kp_k=\nabla f_{k+1}-\nabla f_k\tag{9} Bk+1​αk​pk​=∇fk+1​−∇fk​(9)

定义 s k = x k + 1 − x k = α k p k s_k=x_{k+1}-x_k=\alpha _kp_k sk​=xk+1​−xk​=αk​pk​, y k = ∇ f k + 1 − ∇ f k y_k=\nabla f_{k+1}-\nabla f_k yk​=∇fk+1​−∇fk​,则式(9)可写为
B k + 1 s k = y k (10) B_{k+1}s_k=y_k\tag{10} Bk+1​sk​=yk​(10)

式(10)又称为割线方程(secant equation)。令 H k = B k − 1 H_k=B_{k}^{-1} Hk​=Bk−1​,则有互补形式的割线方程:
H k + 1 y k = s k (11) H_{k+1}y_k=s_k\tag{11} Hk+1​yk​=sk​(11)

另一方面,式(8)也可这么理解(参考《最优化理论与方法》),即直接考虑以 x x x为自变量(而不是引入额外的变量 p p p):
f ( x ) ≈ f ( x k + 1 ) + ∇ f k + 1 ( x − x k + 1 ) + 1 2 ( x − x k + 1 ) T B k + 1 ( x − x k + 1 ) (12) f(x)\approx f(x_{k+1})+\nabla f_{k+1}(x-x_{k+1})+\frac{1}{2}(x-x_{k+1})^{\mathrm{T}}B_{k+1}(x-x_{k+1})\tag{12} f(x)≈f(xk+1​)+∇fk+1​(x−xk+1​)+21​(x−xk+1​)TBk+1​(x−xk+1​)(12)

对上式两边求导,有
∇ f ( x ) ≈ ∇ f k + 1 + B k + 1 ( x − x k + 1 ) (13) \nabla f(x)\approx \nabla f_{k+1}+B_{k+1}(x-x_{k+1})\tag{13} ∇f(x)≈∇fk+1​+Bk+1​(x−xk+1​)(13)

上式中令 x = x k x=x_k x=xk​即可得到式(8)。

2.拟牛顿法中的矩阵递推

拟牛顿法中的矩阵递推关系具有比较强的构造痕迹,典型的方法包括秩一和秩二校正。在介绍具体内容之前,首先引入矩阵求逆的Sherman–Morrison公式:对于非奇异方阵 A A A,设 A ˉ = A + a b T \bar{A}=A+ab^{\mathrm{T}} Aˉ=A+abT,其中 a , b ∈ R n a,b\in \mathbb{R} ^n a,b∈Rn,若 A ˉ \bar{A} Aˉ非奇异,则有
A ˉ − 1 = A − 1 − A − 1 a b T A − 1 1 + b T A − 1 a (14) \bar{A}^{-1}=A^{-1}-\frac{A^{-1}ab^{\mathrm{T}}A^{-1}}{1+b^{\mathrm{T}}A^{-1}a}\tag{14} Aˉ−1=A−1−1+bTA−1aA−1abTA−1​(14)

2.1 秩一校正

为了保证对称性,首先考虑构造如下关系(秩一校正):
B k + 1 = B k + σ v v T (15) B_{k+1}=B_k+\sigma vv^{\mathrm{T}}\tag{15} Bk+1​=Bk​+σvvT(15)

式中: σ \sigma σ取值为 1 1 1或者 − 1 -1 −1,且选择 σ \sigma σ和 v v v使得 B k + 1 B_{k+1} Bk+1​满足割线方程(10),于是有
y k = B k s k + [ σ v T s k ] v ⇒ [ σ v T s k ] v = y k − B k s k (16) y_k=B_ks_k+\left[ \sigma v^{\mathrm{T}}s_k \right] v\Rightarrow \left[ \sigma v^{\mathrm{T}}s_k \right] v=y_k-B_ks_k\tag{16} yk​=Bk​sk​+[σvTsk​]v⇒[σvTsk​]v=yk​−Bk​sk​(16)

由于 σ v T s k \sigma v^{\mathrm{T}}s_k σvTsk​为标量,因此 v v v必定为 y k − B k s k y_k-B_ks_k yk​−Bk​sk​的倍数,即存在标量 δ \delta δ使得 v = δ ( y k − B k s k ) v=\delta\left( y_k-B_ks_k \right) v=δ(yk​−Bk​sk​),则有
y k − B k s k = σ δ 2 [ s k T ( y k − B k s k ) ] ( y k − B k s k ) (17) y_k-B_ks_k=\sigma \delta ^2\left[ s_{k}^{\mathrm{T}}\left( y_k-B_ks_k \right) \right] \left( y_k-B_ks_k \right)\tag{17} yk​−Bk​sk​=σδ2[skT​(yk​−Bk​sk​)](yk​−Bk​sk​)(17)

为确保上式成立,可选取
σ = s i g n [ s k T ( y k − B k s k ) ] , δ = ± ∣ s k T ( y k − B k s k ) ∣ − 1 2 (18) \sigma =\mathrm{sign}\left[ s_{k}^{\mathrm{T}}\left( y_k-B_ks_k \right) \right] , \delta =\pm \left| s_{k}^{\mathrm{T}}\left( y_k-B_ks_k \right) \right|^{-\frac{1}{2}}\tag{18} σ=sign[skT​(yk​−Bk​sk​)],δ=±∣∣​skT​(yk​−Bk​sk​)∣∣​−21​(18)

则 v = ± ∣ s k T ( y k − B k s k ) ∣ − 1 2 ( y k − B k s k ) v=\pm \left| s_{k}^{\mathrm{T}}\left( y_k-B_ks_k \right) \right|^{-\frac{1}{2}}\left( y_k-B_ks_k \right) v=±∣∣​skT​(yk​−Bk​sk​)∣∣​−21​(yk​−Bk​sk​),结合式(15)和(18)可得
B k + 1 = B k + σ v v T = B k + s i g n [ s k T ( y k − B k s k ) ] × ∣ s k T ( y k − B k s k ) ∣ − 1 ( y k − B k s k ) ( y k − B k s k ) T = B k + ( y k − B k s k ) ( y k − B k s k ) T ( y k − B k s k ) T s k (19) \begin{aligned} B_{k+1}&=B_k+\sigma vv^{\mathrm{T}}\\ &=B_k+\mathrm{sign}\left[ s_{k}^{\mathrm{T}}\left( y_k-B_ks_k \right) \right] \times \left| s_{k}^{\mathrm{T}}\left( y_k-B_ks_k \right) \right|^{-1}\left( y_k-B_ks_k \right) \left( y_k-B_ks_k \right) ^{\mathrm{T}}\\ &=B_k+\frac{\left( y_k-B_ks_k \right) \left( y_k-B_ks_k \right) ^{\mathrm{T}}}{\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k} \end{aligned}\tag{19} Bk+1​​=Bk​+σvvT=Bk​+sign[skT​(yk​−Bk​sk​)]×∣∣​skT​(yk​−Bk​sk​)∣∣​−1(yk​−Bk​sk​)(yk​−Bk​sk​)T=Bk​+(yk​−Bk​sk​)Tsk​(yk​−Bk​sk​)(yk​−Bk​sk​)T​​(19)

根据式(19),取 a = y k − B k s k ( y k − B k s k ) T s k a=\frac{y_k-B_ks_k}{\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k} a=(yk​−Bk​sk​)Tsk​yk​−Bk​sk​​,    b = y k − B k s k \,\,b=y_k-B_ks_k b=yk​−Bk​sk​,且考虑到 H k H_k Hk​和 B k B_k Bk​为互逆关系,且均为对称矩阵,则由Sherman–Morrison公式(14)有
H k + 1 = H k − H k y k − B k s k ( y k − B k s k ) T s k ( y k − B k s k ) T H k 1 + ( y k − B k s k ) T H k y k − B k s k ( y k − B k s k ) T s k = H k − ( H k y k − s k ) ( H k y k − s k ) T ( y k − B k s k ) T s k + ( y k − B k s k ) T ( H k y k − s k ) = H k + ( s k − H k y k ) ( s k − H k y k ) T − ( y k − B k s k ) T s k + ( s k − H k y k ) T ( y k − B k s k ) (20) \begin{aligned} H_{k+1}&=H_k-\frac{H_k\frac{y_k-B_ks_k}{\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k}\left( y_k-B_ks_k \right) ^{\mathrm{T}}H_k}{1+\left( y_k-B_ks_k \right) ^{\mathrm{T}}H_k\frac{y_k-B_ks_k}{\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k}}\\ &=H_k-\frac{\left( H_ky_k-s_k \right) \left( H_ky_k-s_k \right) ^{\mathrm{T}}}{\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k+\left( y_k-B_ks_k \right) ^{\mathrm{T}}\left( H_ky_k-s_k \right)}\\ &=H_k+\frac{\left( s_k-H_ky_k \right) \left( s_k-H_ky_k \right) ^{\mathrm{T}}}{-\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k+\left( s_k-H_ky_k \right) ^{\mathrm{T}}\left( y_k-B_ks_k \right)} \end{aligned}\tag{20} Hk+1​​=Hk​−1+(yk​−Bk​sk​)THk​(yk​−Bk​sk​)Tsk​yk​−Bk​sk​​Hk​(yk​−Bk​sk​)Tsk​yk​−Bk​sk​​(yk​−Bk​sk​)THk​​=Hk​−(yk​−Bk​sk​)Tsk​+(yk​−Bk​sk​)T(Hk​yk​−sk​)(Hk​yk​−sk​)(Hk​yk​−sk​)T​=Hk​+−(yk​−Bk​sk​)Tsk​+(sk​−Hk​yk​)T(yk​−Bk​sk​)(sk​−Hk​yk​)(sk​−Hk​yk​)T​​(20)

考虑到
− ( y k − B k s k ) T s k + ( s k − H k y k ) T ( y k − B k s k ) = − y k T s k + s k T B k s k + ( s k − H k y k ) T y k − ( s k − H k y k ) T B k s k = − y k T s k + s k T B k s k + ( s k − H k y k ) T y k − s k T B k s k + y k T s k = ( s k − H k y k ) T y k (21) \begin{aligned} & -\left( y_k-B_ks_k \right) ^{\mathrm{T}}s_k+\left( s_k-H_ky_k \right) ^{\mathrm{T}}\left( y_k-B_ks_k \right) \\ =&-y_{k}^{\mathrm{T}}s_k+s_{k}^{\mathrm{T}}B_ks_k+\left( s_k-H_ky_k \right) ^{\mathrm{T}}y_k-\left( s_k-H_ky_k \right) ^{\mathrm{T}}B_ks_k\\ =&-y_{k}^{\mathrm{T}}s_k+s_{k}^{\mathrm{T}}B_ks_k+\left( s_k-H_ky_k \right) ^{\mathrm{T}}y_k-s_{k}^{\mathrm{T}}B_ks_k+y_{k}^{\mathrm{T}}s_k\\ =&\left( s_k-H_ky_k \right) ^{\mathrm{T}}y_k \end{aligned}\tag{21} ===​−(yk​−Bk​sk​)Tsk​+(sk​−Hk​yk​)T(yk​−Bk​sk​)−ykT​sk​+skT​Bk​sk​+(sk​−Hk​yk​)Tyk​−(sk​−Hk​yk​)TBk​sk​−ykT​sk​+skT​Bk​sk​+(sk​−Hk​yk​)Tyk​−skT​Bk​sk​+ykT​sk​(sk​−Hk​yk​)Tyk​​(21)
于是有

H k + 1 = H k + ( s k − H k y k ) ( s k − H k y k ) T ( s k − H k y k ) T y k (22) H_{k+1}=H_k+\frac{\left( s_k-H_ky_k \right) \left( s_k-H_ky_k \right) ^{\mathrm{T}}}{\left( s_k-H_ky_k \right) ^{\mathrm{T}}y_k}\tag{22} Hk+1​=Hk​+(sk​−Hk​yk​)Tyk​(sk​−Hk​yk​)(sk​−Hk​yk​)T​(22)

值得一提的是,如果利用式(11)对 H k H_k Hk​进行类似式(15)-(18)的推导(相当于 H k H_k Hk​和 B k B_k Bk​、 y k y_k yk​和 s k s_k sk​互换),也可得到式(22),式(22)与(19)正好构成了互补关系。

2.2 秩二校正(DFP和BFGS)

考虑构造如下关系:
B k + 1 = B k + σ 1 u u T + σ 2 v v T (23) B_{k+1}=B_k+\sigma _1uu^{\mathrm{T}}+\sigma _2vv^{\mathrm{T}}\tag{23} Bk+1​=Bk​+σ1​uuT+σ2​vvT(23)

式中: σ 1 \sigma _1 σ1​和 σ 2 \sigma _2 σ2​为标量, u u u和 v v v为 n n n维向量,则由割线方程(10)有

y k = B k s k + [ σ 1 u T s k ] u + [ σ 2 v T s k ] v (24) y_k=B_ks_k+\left[ \sigma _1u^{\mathrm{T}}s_k \right] u+\left[ \sigma _2v^{\mathrm{T}}s_k \right] v\tag{24} yk​=Bk​sk​+[σ1​uTsk​]u+[σ2​vTsk​]v(24)

选取 u = y k u=y_k u=yk​, [ σ 1 u T s k ] = 1 \left[ \sigma _1u^{\mathrm{T}}s_k \right] =1 [σ1​uTsk​]=1, v = B k s k v=B_ks_k v=Bk​sk​, [ σ 2 v T s k ] = − 1 \left[ \sigma _2v^{\mathrm{T}}s_k \right] =-1 [σ2​vTsk​]=−1可使得式(24)成立,于是 σ 1 = 1 y k T s k \sigma _1=\frac{1}{y_{k}^{\mathrm{T}}s_k} σ1​=ykT​sk​1​, σ 2 = − 1 s k T B k s k \sigma _2=-\frac{1}{s_{k}^{\mathrm{T}}B_ks_k} σ2​=−skT​Bk​sk​1​,则有
B k + 1 = B k + σ 1 u u T + σ 2 v v T = B k + y k y k T y k T s k − B k s k s k T B k s k T B k s k (25) \begin{aligned} B_{k+1}&=B_k+\sigma _1uu^{\mathrm{T}}+\sigma _2vv^{\mathrm{T}}\\ &=B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}-\frac{B_ks_ks_{k}^{\mathrm{T}}B_k}{s_{k}^{\mathrm{T}}B_ks_k} \end{aligned} \tag{25} Bk+1​​=Bk​+σ1​uuT+σ2​vvT=Bk​+ykT​sk​yk​ykT​​−skT​Bk​sk​Bk​sk​skT​Bk​​​(25)

式(25)即为关于 B k B_k Bk​的BFGS校正,进一步,令 a = y k y k T s k a=\frac{y_k}{y_{k}^{\mathrm{T}}s_k} a=ykT​sk​yk​​, b = y k b=y_k b=yk​,则由Sherman–Morrison公式(14)有
( B k + y k y k T y k T s k ) − 1 = B k − 1 − B k − 1 y k y k T y k T s k B k − 1 1 + y k T B k − 1 y k y k T s k = B k − 1 − B k − 1 y k y k T B k − 1 y k T s k + y k T B k − 1 y k = H k − H k y k y k T H k y k T s k + y k T H k y k (26) \begin{aligned} \left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}&=B_{k}^{-1}-\frac{B_{k}^{-1}\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}B_{k}^{-1}}{1+y_{k}^{\mathrm{T}}B_{k}^{-1}\frac{y_k}{y_{k}^{\mathrm{T}}s_k}}\\ &=B_{k}^{-1}-\frac{B_{k}^{-1}y_ky_{k}^{\mathrm{T}}B_{k}^{-1}}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}B_{k}^{-1}y_k}\\ &=H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \end{aligned}\tag{26} (Bk​+ykT​sk​yk​ykT​​)−1​=Bk−1​−1+ykT​Bk−1​ykT​sk​yk​​Bk−1​ykT​sk​yk​ykT​​Bk−1​​=Bk−1​−ykT​sk​+ykT​Bk−1​yk​Bk−1​yk​ykT​Bk−1​​=Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​​(26)

再一次,令 a = − B k s k s k T B k s k a=-\frac{B_ks_k}{s_{k}^{\mathrm{T}}B_ks_k} a=−skT​Bk​sk​Bk​sk​​, b = B k s k b=B_ks_k b=Bk​sk​,则由Sherman–Morrison公式(14)有
( B k + y k y k T y k T s k − B k s k s k T B k s k T B k s k ) − 1 = ( B k + y k y k T y k T s k ) − 1 + ( B k + y k y k T y k T s k ) − 1 B k s k s k T B k s k T B k s k ( B k + y k y k T y k T s k ) − 1 1 − s k T B k ( B k + y k y k T y k T s k ) − 1 B k s k s k T B k s k = ( B k + y k y k T y k T s k ) − 1 + ( B k + y k y k T y k T s k ) − 1 B k s k s k T B k ( B k + y k y k T y k T s k ) − 1 s k T B k s k − s k T B k ( B k + y k y k T y k T s k ) − 1 B k s k = H k − H k y k y k T H k y k T s k + y k T H k y k + ( H k − H k y k y k T H k y k T s k + y k T H k y k ) B k s k s k T B k ( H k − H k y k y k T H k y k T s k + y k T H k y k ) s k T B k s k − s k T B k ( H k − H k y k y k T H k y k T s k + y k T H k y k ) B k s k = H k − H k y k y k T H k y k T s k + y k T H k y k + ( I − H k y k y k T y k T s k + y k T H k y k ) s k s k T ( I − y k y k T H k y k T s k + y k T H k y k ) s k T B k s k − s k T ( I − y k y k T H k y k T s k + y k T H k y k ) B k s k = H k − H k y k y k T H k y k T s k + y k T H k y k + ( I − H k y k y k T y k T s k + y k T H k y k ) s k s k T ( I − y k y k T H k y k T s k + y k T H k y k ) s k T y k y k T y k T s k + y k T H k y k s k (27) \begin{aligned} &\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}-\frac{B_ks_ks_{k}^{\mathrm{T}}B_k}{s_{k}^{\mathrm{T}}B_ks_k} \right) ^{-1}\\ =&\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}+\frac{\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}\frac{B_ks_ks_{k}^{\mathrm{T}}B_k}{s_{k}^{\mathrm{T}}B_ks_k}\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}}{1-s_{k}^{\mathrm{T}}B_k\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}\frac{B_ks_k}{s_{k}^{\mathrm{T}}B_ks_k}}\\ =&\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}+\frac{\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}B_ks_ks_{k}^{\mathrm{T}}B_k\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}}{s_{k}^{\mathrm{T}}B_ks_k-s_{k}^{\mathrm{T}}B_k\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k} \right) ^{-1}B_ks_k}\\ =&H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k}+\frac{\left( H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right) B_ks_ks_{k}^{\mathrm{T}}B_k\left( H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right)}{s_{k}^{\mathrm{T}}B_ks_k-s_{k}^{\mathrm{T}}B_k\left( H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right) B_ks_k}\\ =&H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k}+\frac{\left( I-\frac{H_ky_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right) s_ks_{k}^{\mathrm{T}}\left( I-\frac{y_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right)}{s_{k}^{\mathrm{T}}B_ks_k-s_{k}^{\mathrm{T}}\left( I-\frac{y_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right) B_ks_k}\\ =&H_k-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k}+\frac{\left( I-\frac{H_ky_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right) s_ks_{k}^{\mathrm{T}}\left( I-\frac{y_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right)}{s_{k}^{\mathrm{T}}\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k}s_k} \end{aligned}\tag{27} =====​(Bk​+ykT​sk​yk​ykT​​−skT​Bk​sk​Bk​sk​skT​Bk​​)−1(Bk​+ykT​sk​yk​ykT​​)−1+1−skT​Bk​(Bk​+ykT​sk​yk​ykT​​)−1skT​Bk​sk​Bk​sk​​(Bk​+ykT​sk​yk​ykT​​)−1skT​Bk​sk​Bk​sk​skT​Bk​​(Bk​+ykT​sk​yk​ykT​​)−1​(Bk​+ykT​sk​yk​ykT​​)−1+skT​Bk​sk​−skT​Bk​(Bk​+ykT​sk​yk​ykT​​)−1Bk​sk​(Bk​+ykT​sk​yk​ykT​​)−1Bk​sk​skT​Bk​(Bk​+ykT​sk​yk​ykT​​)−1​Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​+skT​Bk​sk​−skT​Bk​(Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​)Bk​sk​(Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​)Bk​sk​skT​Bk​(Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​)​Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​+skT​Bk​sk​−skT​(I−ykT​sk​+ykT​Hk​yk​yk​ykT​Hk​​)Bk​sk​(I−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​​)sk​skT​(I−ykT​sk​+ykT​Hk​yk​yk​ykT​Hk​​)​Hk​−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​+skT​ykT​sk​+ykT​Hk​yk​yk​ykT​​sk​(I−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​​)sk​skT​(I−ykT​sk​+ykT​Hk​yk​yk​ykT​Hk​​)​​(27)

对于式(27)的最后一项,有
( I − H k y k y k T y k T s k + y k T H k y k ) s k s k T ( I − y k y k T H k y k T s k + y k T H k y k ) s k T y k y k T y k T s k + y k T H k y k s k = ( y k T s k + y k T H k y k − H k y k y k T ) s k s k T ( y k T s k + y k T H k y k − y k y k T H k ) s k T y k y k T s k ( y k T s k + y k T H k y k ) = ( y k T s k + y k T H k y k ) s k s k T ( y k T s k + y k T H k y k − y k y k T H k ) s k T y k y k T s k ( y k T s k + y k T H k y k ) − H k y k y k T s k s k T ( y k T s k + y k T H k y k − y k y k T H k ) s k T y k y k T s k ( y k T s k + y k T H k y k ) = s k s k T ( y k T s k + y k T H k y k − y k y k T H k ) s k T y k y k T s k − H k y k y k T s k s k T s k T y k y k T s k + H k y k y k T s k s k T y k y k T H k s k T y k y k T s k ( y k T s k + y k T H k y k ) = s k s k T s k T y k + s k s k T y k T H k y k s k T y k y k T s k − s k y k T H k s k T y k − H k y k s k T s k T y k + H k y k y k T H k y k T s k + y k T H k y k (28) \begin{aligned} &\frac{\left( I-\frac{H_ky_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right) s_ks_{k}^{\mathrm{T}}\left( I-\frac{y_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \right)}{s_{k}^{\mathrm{T}}\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k}s_k}\\ =&\frac{\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k-H_ky_ky_{k}^{\mathrm{T}} \right) s_ks_{k}^{\mathrm{T}}\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k-y_ky_{k}^{\mathrm{T}}H_k \right)}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k \right)}\\ =&\frac{\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k \right) s_ks_{k}^{\mathrm{T}}\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k-y_ky_{k}^{\mathrm{T}}H_k \right)}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k \right)}-\frac{H_ky_ky_{k}^{\mathrm{T}}s_ks_{k}^{\mathrm{T}}\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k-y_ky_{k}^{\mathrm{T}}H_k \right)}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k \right)}\\ =&\frac{s_ks_{k}^{\mathrm{T}}\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k-y_ky_{k}^{\mathrm{T}}H_k \right)}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k}-\frac{H_ky_ky_{k}^{\mathrm{T}}s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k}+\frac{H_ky_ky_{k}^{\mathrm{T}}s_ks_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}H_k}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k\left( y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k \right)}\\ =&\frac{s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}+\frac{s_ks_{k}^{\mathrm{T}}y_{k}^{\mathrm{T}}H_ky_k}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k}-\frac{s_ky_{k}^{\mathrm{T}}H_k}{s_{k}^{\mathrm{T}}y_k}-\frac{H_ky_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}+\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}s_k+y_{k}^{\mathrm{T}}H_ky_k} \end{aligned}\tag{28} ====​skT​ykT​sk​+ykT​Hk​yk​yk​ykT​​sk​(I−ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​​)sk​skT​(I−ykT​sk​+ykT​Hk​yk​yk​ykT​Hk​​)​skT​yk​ykT​sk​(ykT​sk​+ykT​Hk​yk​)(ykT​sk​+ykT​Hk​yk​−Hk​yk​ykT​)sk​skT​(ykT​sk​+ykT​Hk​yk​−yk​ykT​Hk​)​skT​yk​ykT​sk​(ykT​sk​+ykT​Hk​yk​)(ykT​sk​+ykT​Hk​yk​)sk​skT​(ykT​sk​+ykT​Hk​yk​−yk​ykT​Hk​)​−skT​yk​ykT​sk​(ykT​sk​+ykT​Hk​yk​)Hk​yk​ykT​sk​skT​(ykT​sk​+ykT​Hk​yk​−yk​ykT​Hk​)​skT​yk​ykT​sk​sk​skT​(ykT​sk​+ykT​Hk​yk​−yk​ykT​Hk​)​−skT​yk​ykT​sk​Hk​yk​ykT​sk​skT​​+skT​yk​ykT​sk​(ykT​sk​+ykT​Hk​yk​)Hk​yk​ykT​sk​skT​yk​ykT​Hk​​skT​yk​sk​skT​​+skT​yk​ykT​sk​sk​skT​ykT​Hk​yk​​−skT​yk​sk​ykT​Hk​​−skT​yk​Hk​yk​skT​​+ykT​sk​+ykT​Hk​yk​Hk​yk​ykT​Hk​​​(28)

结合式(27)和(28)可得
H k + 1 = ( B k + y k y k T y k T s k − B k s k s k T B k s k T B k s k ) − 1 = H k + s k s k T s k T y k + s k s k T y k T H k y k s k T y k y k T s k − s k y k T H k s k T y k − H k y k s k T s k T y k = ( I − s k y k T s k T y k ) H k + s k y k T H k y k s k T ( s k T y k ) 2 − H k y k s k T s k T y k + s k s k T s k T y k = ( I − s k y k T s k T y k ) H k − ( I − s k y k T s k T y k ) H k y k s k T s k T y k + s k s k T s k T y k = ( I − s k y k T s k T y k ) H k ( I − y k s k T s k T y k ) + s k s k T s k T y k (29) \begin{aligned} H_{k+1}&=\left( B_k+\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}-\frac{B_ks_ks_{k}^{\mathrm{T}}B_k}{s_{k}^{\mathrm{T}}B_ks_k} \right) ^{-1}\\ &=H_k+\frac{s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}+\frac{s_ks_{k}^{\mathrm{T}}y_{k}^{\mathrm{T}}H_ky_k}{s_{k}^{\mathrm{T}}y_ky_{k}^{\mathrm{T}}s_k}-\frac{s_ky_{k}^{\mathrm{T}}H_k}{s_{k}^{\mathrm{T}}y_k}-\frac{H_ky_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}\\ &=\left( I-\frac{s_ky_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) H_k+\frac{s_ky_{k}^{\mathrm{T}}H_ky_ks_{k}^{\mathrm{T}}}{\left( s_{k}^{\mathrm{T}}y_k \right) ^2}-\frac{H_ky_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}+\frac{s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}\\ &=\left( I-\frac{s_ky_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) H_k-\left( I-\frac{s_ky_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) H_k\frac{y_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}+\frac{s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}\\ &=\left( I-\frac{s_ky_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) H_k\left( I-\frac{y_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) +\frac{s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \end{aligned}\tag{29} Hk+1​​=(Bk​+ykT​sk​yk​ykT​​−skT​Bk​sk​Bk​sk​skT​Bk​​)−1=Hk​+skT​yk​sk​skT​​+skT​yk​ykT​sk​sk​skT​ykT​Hk​yk​​−skT​yk​sk​ykT​Hk​​−skT​yk​Hk​yk​skT​​=(I−skT​yk​sk​ykT​​)Hk​+(skT​yk​)2sk​ykT​Hk​yk​skT​​−skT​yk​Hk​yk​skT​​+skT​yk​sk​skT​​=(I−skT​yk​sk​ykT​​)Hk​−(I−skT​yk​sk​ykT​​)Hk​skT​yk​yk​skT​​+skT​yk​sk​skT​​=(I−skT​yk​sk​ykT​​)Hk​(I−skT​yk​yk​skT​​)+skT​yk​sk​skT​​​(29)

式(29)即为关于 H k H_k Hk​的BFGS校正。
若一开始选择构造如下关系:
H k + 1 = H k + σ 1 u u T + σ 2 v v T (30) H_{k+1}=H_k+\sigma _1uu^{\mathrm{T}}+\sigma _2vv^{\mathrm{T}}\tag{30} Hk+1​=Hk​+σ1​uuT+σ2​vvT(30)

则由割线方程(11)有
s k = H k y k + [ σ 1 u T y k ] u + [ σ 2 v T y k ] v (31) s_k=H_ky_k+\left[ \sigma _1u^{\mathrm{T}}y_k \right] u+\left[ \sigma _2v^{\mathrm{T}}y_k \right] v\tag{31} sk​=Hk​yk​+[σ1​uTyk​]u+[σ2​vTyk​]v(31)

选取 u = s k u=s_k u=sk​, [ σ 1 u T y k ] = 1 \left[ \sigma _1u^{\mathrm{T}}y_k \right] =1 [σ1​uTyk​]=1, v = H k y k v=H_ky_k v=Hk​yk​, [ σ 2 v T y k ] = − 1 \left[ \sigma _2v^{\mathrm{T}}y_k \right] =-1 [σ2​vTyk​]=−1可使得式(31)成立,于是 σ 1 = 1 s k T y k \sigma _1=\frac{1}{s_{k}^{\mathrm{T}}y_k} σ1​=skT​yk​1​, σ 2 = − 1 y k T H k y k \sigma _2=-\frac{1}{y_{k}^{\mathrm{T}}H_ky_k} σ2​=−ykT​Hk​yk​1​,则有
H k + 1 = H k + σ 1 u u T + σ 2 v v T = H k + s k s k T s k T y k − H k y k y k T H k y k T H k y k (32) \begin{aligned} H_{k+1}&=H_k+\sigma _1uu^{\mathrm{T}}+\sigma _2vv^{\mathrm{T}}\\ &=H_k+\frac{s_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}-\frac{H_ky_ky_{k}^{\mathrm{T}}H_k}{y_{k}^{\mathrm{T}}H_ky_k} \end{aligned}\tag{32} Hk+1​​=Hk​+σ1​uuT+σ2​vvT=Hk​+skT​yk​sk​skT​​−ykT​Hk​yk​Hk​yk​ykT​Hk​​​(32)

式(32)即为关于 H k H_k Hk​的DFP校正。
类似(27)-(29)的推导,对式(32)利用Sherman–Morrison公式(14)可得
   B k + 1 = ( I − y k s k T s k T y k ) B k ( I − s k y k T s k T y k ) + y k y k T s k T y k (33) \,\, B_{k+1}=\left( I-\frac{y_ks_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) B_k\left( I-\frac{s_ky_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k} \right) +\frac{y_ky_{k}^{\mathrm{T}}}{s_{k}^{\mathrm{T}}y_k}\tag{33} Bk+1​=(I−skT​yk​yk​skT​​)Bk​(I−skT​yk​sk​ykT​​)+skT​yk​yk​ykT​​(33)

式(33)即为关于 B k B_k Bk​的DFP校正。
对比式(25)、(29)、(32)和(33)可知,将 H k H_k Hk​和 B k B_k Bk​、 y k y_k yk​和 s k s_k sk​互换,则式(25)和(32)、(29)和(33)可互相推导得到。根据《最优化理论与方法》中的描述,给出任一个拟牛顿校正 H k + 1 H_{k+1} Hk+1​,通过交换 H ↔ B H\leftrightarrow B H↔B, s ↔ y s\leftrightarrow y s↔y,可以得到其关于 B B B的对偶校正 B k + 1 B_{k+1} Bk+1​,再利用Sherman–Morrison公式,就可得到关于 H H H的对偶校正 H k + 1 D H_{k+1}^{D} Hk+1D​。DFP和BFGS校正正好满足这种对偶关系:

最优化学习笔记1——关于拟牛顿法推导

图1 DFP和BFGS的对偶关系 \text{图1 DFP和BFGS的对偶关系} 图1 DFP和BFGS的对偶关系

值得指出的是,对于秩一校正,对式(19)进行交换操作可得到式(22),再求逆又得到了式(19),因此秩一校正是自对偶的。

2.3 DFP校正的最优性

根据《Numerical Optimization》第6章的描述,DFP校正(33)是下述优化问题的最优解:
min ⁡ B ∥ B − B k ∥ W s u b j e c t t o B = B T , B s k = y k (34) \min_B \left\| B-B_k \right\| _W\\\mathrm{subject} \mathrm{to} B=B^{\mathrm{T}}, Bs_k=y_k\tag{34} Bmin​∥B−Bk​∥W​subjecttoB=BT,Bsk​=yk​(34)

式中: B k B_k Bk​为对称正定阵,对于方阵 A A A, ∥ A ∥ W ≡ ∥ W 1 / 2 A W 1 / 2 ∥ F \left\| A \right\| _W\equiv \left\| W^{1/2}AW^{1/2} \right\| _F ∥A∥W​≡∥∥​W1/2AW1/2∥∥​F​, ∥ A ∥ F 2 ≡ ∑ i = 1 n ∑ j = 1 n a i j 2 \left\| A \right\| _{F}^{2}\equiv \sum_{i=1}^n{\sum_{j=1}^n{a_{ij}^{2}}} ∥A∥F2​≡∑i=1n​∑j=1n​aij2​为矩阵的F-范数, W = G ˉ k − 1 W=\bar{G}_{k}^{-1} W=Gˉk−1​, G ˉ k = ∫ 0 1 ∇ 2 f ( x k + τ α k p k ) d τ \bar{G}_k=\int_0^1{\nabla ^2f\left( x_k+\tau \alpha _kp_k \right) \mathrm{d}\tau} Gˉk​=∫01​∇2f(xk​+ταk​pk​)dτ为平均Hessian阵(故 W W W为对称阵),根据Taylor展开 ∇ f ( x + p ) = ∇ f ( x ) + ∫ 0 1 ∇ 2 f ( x + t p ) p d t \nabla f\left( x+p \right) =\nabla f\left( x \right) +\int_0^1{\nabla ^2f(x+tp)p\mathrm{d}t} ∇f(x+p)=∇f(x)+∫01​∇2f(x+tp)pdt可知
y k = G ˉ k α k p k = G ˉ k s k (35) y_k=\bar{G}_k\alpha _kp_k=\bar{G}_ks_k\tag{35} yk​=Gˉk​αk​pk​=Gˉk​sk​(35)

书中没有给出详细证明,在math.stackexchange.com网站上有高手进行了推导,这里进行整理总结。
首先,原优化问题(34)的目标函数为 min ⁡ B ∥ W 1 / 2 B W 1 / 2 − W 1 / 2 B k W 1 / 2 ∥ F \displaystyle{\min_B} \left\| W^{1/2}BW^{1/2}-W^{1/2}B_kW^{1/2} \right\| _F Bmin​∥∥∥​W1/2BW1/2−W1/2Bk​W1/2∥∥∥​F​,记 B ^ = W 1 / 2 B W 1 / 2 \hat{B}=W^{1/2}BW^{1/2} B^=W1/2BW1/2, B ^ k = W 1 / 2 B k W 1 / 2 \hat{B}_k=W^{1/2}B_kW^{1/2} B^k​=W1/2Bk​W1/2,由式(35)可知
W y k = s k ⇔ W − 1 / 2 W y k = W − 1 / 2 s k ⇔ W 1 / 2 y k = W − 1 / 2 s k (36) Wy_k=s_k\Leftrightarrow W^{-1/2}Wy_k=W^{-1/2}s_k\Leftrightarrow W^{1/2}y_k=W^{-1/2}s_k\tag{36} Wyk​=sk​⇔W−1/2Wyk​=W−1/2sk​⇔W1/2yk​=W−1/2sk​(36)

记 y ^ k = W 1 / 2 y k \hat{y}_k=W^{1/2}y_k y^​k​=W1/2yk​, s ^ k = W − 1 / 2 s k \hat{s}_k=W^{-1/2}s_k s^k​=W−1/2sk​,则 y ^ k = s ^ k \hat{y}_k=\hat{s}_k y^​k​=s^k​,又由于
B s k = y k ⇔ W 1 / 2 B W 1 / 2 W − 1 / 2 s k = W 1 / 2 y k ⇔ B ^ s ^ k = y ^ k ⇔ B ^ s ^ k = s ^ k (37) Bs_k=y_k\Leftrightarrow W^{1/2}BW^{1/2}W^{-1/2}s_k=W^{1/2}y_k\Leftrightarrow \hat{B}\hat{s}_k=\hat{y}_k\Leftrightarrow \hat{B}\hat{s}_k=\hat{s}_k\tag{37} Bsk​=yk​⇔W1/2BW1/2W−1/2sk​=W1/2yk​⇔B^s^k​=y^​k​⇔B^s^k​=s^k​(37)

则原优化问题(34)可写为
min ⁡ B ^ ∥ B ^ − B ^ k ∥ F s u b j e c t t o B ^ s ^ k = s ^ k (38) \min_{\hat{B}} \left\| \hat{B}-\hat{B}_k \right\| _F\\\mathrm{subject} \mathrm{to} \hat{B}\hat{s}_k=\hat{s}_k\tag{38} B^min​∥∥∥​B^−B^k​∥∥∥​F​subjecttoB^s^k​=s^k​(38)

可见 s ^ k \hat{s}_k s^k​为 B ^ \hat{B} B^的特征向量,由于 B ^ \hat{B} B^为对称矩阵(因此为正规矩阵,必酉相似于对角阵),构造酉矩阵 U = [ u ∣ u ⊥ ] U=\left[ u|u_{\bot} \right] U=[u∣u⊥​],其中 u = s ^ k ∥ s ^ k ∥ u=\frac{\hat{s}_k}{\left\| \hat{s}_k \right\|} u=∥s^k​∥s^k​​, u ⊥ u_{\bot} u⊥​为 u u u的正交补,则 u T B ^ u = u T u = 1 u^{\mathrm{T}}\hat{B}u=u^{\mathrm{T}}u=1 uTB^u=uTu=1, u ⊥ T B ^ u = u ⊥ T u = 0 u_{\bot}^{\mathrm{T}}\hat{B}u=u_{\bot}^{\mathrm{T}}u=0 u⊥T​B^u=u⊥T​u=0,从而有
U T B ^ k U − U T B ^ U = [ u T u ⊥ T ] B ^ k [ u u ⊥ ] − [ u T u ⊥ T ] B ^ [ u u ⊥ ] = [ u T B ^ k u u T B ^ k u ⊥ u ⊥ T B ^ k u u ⊥ T B ^ k u ⊥ ] − [ 1 0 0 u ⊥ T B ^ u ⊥ ] (39) \begin{aligned} U^{\mathrm{T}}\hat{B}_kU-U^{\mathrm{T}}\hat{B}U=&\left[ \begin{array}{c} u^{\mathrm{T}}\\ u_{\bot}^{\mathrm{T}}\\\end{array} \right] \hat{B}_k\left[ \begin{matrix} u& u_{\bot}\\\end{matrix} \right] -\left[ \begin{array}{c} u^{\mathrm{T}}\\ u_{\bot}^{\mathrm{T}}\\\end{array} \right] \hat{B}\left[ \begin{matrix} u& u_{\bot}\\\end{matrix} \right] \\ =&\left[ \begin{matrix} u^{\mathrm{T}}\hat{B}_ku& u^{\mathrm{T}}\hat{B}_ku_{\bot}\\ u_{\bot}^{\mathrm{T}}\hat{B}_ku& u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot}\\\end{matrix} \right] -\left[ \begin{matrix} 1& 0\\ 0& u_{\bot}^{\mathrm{T}}\hat{B}u_{\bot}\\\end{matrix} \right] \end{aligned}\tag{39} UTB^k​U−UTB^U==​[uTu⊥T​​]B^k​[u​u⊥​​]−[uTu⊥T​​]B^[u​u⊥​​][uTB^k​uu⊥T​B^k​u​uTB^k​u⊥​u⊥T​B^k​u⊥​​]−[10​0u⊥T​B^u⊥​​]​(39)

考虑到酉相似矩阵的具有相同的F-范数,因此
∥ B ^ − B ^ k ∥ F 2 = ∥ U T ( B ^ k − B ^ ) U ∥ F 2 = ∥ [ u T B ^ k u − 1 u T B ^ k u ⊥ u ⊥ T B ^ k u u ⊥ T B ^ k u ⊥ − u ⊥ T B ^ u ⊥ ] ∥ F 2 = ( u T B ^ k u − 1 ) 2 + ∥ u T B ^ k u ⊥ ∥ F 2 + ∥ u ⊥ T B ^ k u ∥ F 2 + ∥ u ⊥ T B ^ k u ⊥ − u ⊥ T B ^ u ⊥ ∥ F 2 (40) \begin{aligned} \left\| \hat{B}-\hat{B}_k \right\| _{F}^{2}=&\left\| U^{\mathrm{T}}(\hat{B}_k-\hat{B})U \right\| _{F}^{2}\\ =&\left\| \left[ \begin{matrix} u^{\mathrm{T}}\hat{B}_ku-1& u^{\mathrm{T}}\hat{B}_ku_{\bot}\\ u_{\bot}^{\mathrm{T}}\hat{B}_ku& u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot}-u_{\bot}^{\mathrm{T}}\hat{B}u_{\bot}\\\end{matrix} \right] \right\| _{F}^{2}\\ =&(u^{\mathrm{T}}\hat{B}_ku-1)^2+\left\| u^{\mathrm{T}}\hat{B}_ku_{\bot} \right\| _{F}^{2}+\left\| u_{\bot}^{\mathrm{T}}\hat{B}_ku \right\| _{F}^{2}+\left\| u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot}-u_{\bot}^{\mathrm{T}}\hat{B}u_{\bot} \right\| _{F}^{2} \end{aligned}\tag{40} ∥∥∥​B^−B^k​∥∥∥​F2​===​∥∥∥​UT(B^k​−B^)U∥∥∥​F2​∥∥∥∥​[uTB^k​u−1u⊥T​B^k​u​uTB^k​u⊥​u⊥T​B^k​u⊥​−u⊥T​B^u⊥​​]∥∥∥∥​F2​(uTB^k​u−1)2+∥∥∥​uTB^k​u⊥​∥∥∥​F2​+∥∥∥​u⊥T​B^k​u∥∥∥​F2​+∥∥∥​u⊥T​B^k​u⊥​−u⊥T​B^u⊥​∥∥∥​F2​​(40)

式(40)右端只有最后一项与 B ^ \hat{B} B^有关(也即与 B B B有关),因此最优解必定满足 u ⊥ T B ^ u ⊥ = u ⊥ T B ^ k u ⊥ u_{\bot}^{\mathrm{T}}\hat{B}u_{\bot}=u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot} u⊥T​B^u⊥​=u⊥T​B^k​u⊥​,根据式(39)可知
B ^ = U [ 1 0 0 u ⊥ T B ^ k u ⊥ ] U T = [ u u ⊥ ] [ 1 0 0 u ⊥ T B ^ k u ⊥ ] [ u T u ⊥ T ] = u u T + u ⊥ u ⊥ T B ^ k u ⊥ u ⊥ T = u u T + ( I − u u T ) B ^ k ( I − u u T ) (41) \begin{aligned} \hat{B}=&U\left[ \begin{matrix} 1& 0\\ 0& u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot}\\\end{matrix} \right] U^{\mathrm{T}}\\ =&\left[ \begin{matrix} u& u_{\bot}\\\end{matrix} \right] \left[ \begin{matrix} 1& 0\\ 0& u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot}\\\end{matrix} \right] \left[ \begin{array}{c} u^{\mathrm{T}}\\ u_{\bot}^{\mathrm{T}}\\\end{array} \right] \\ =&uu^{\mathrm{T}}+u_{\bot}u_{\bot}^{\mathrm{T}}\hat{B}_ku_{\bot}u_{\bot}^{\mathrm{T}}\\ =&uu^{\mathrm{T}}+(I-uu^{\mathrm{T}})\hat{B}_k(I-uu^{\mathrm{T}}) \end{aligned}\tag{41} B^====​U[10​0u⊥T​B^k​u⊥​​]UT[u​u⊥​​][10​0u⊥T​B^k​u⊥​​][uTu⊥T​​]uuT+u⊥​u⊥T​B^k​u⊥​u⊥T​uuT+(I−uuT)B^k​(I−uuT)​(41)

式(41)中用到了如下关系式:
I = U U T = [ u u ⊥ ] [ u T u ⊥ T ] = u u T + u ⊥ u ⊥ T ⇔ u ⊥ u ⊥ T = I − u u T (42) \begin{aligned} I=&UU^{\mathrm{T}}\\ =&\left[ \begin{matrix} u& u_{\bot}\\\end{matrix} \right] \left[ \begin{array}{c} u^{\mathrm{T}}\\ u_{\bot}^{\mathrm{T}}\\\end{array} \right] \\ =&uu^{\mathrm{T}}+u_{\bot}u_{\bot}^{\mathrm{T}}\\ \Leftrightarrow& u_{\bot}u_{\bot}^{\mathrm{T}}=I-uu^{\mathrm{T}} \end{aligned}\tag{42} I===⇔​UUT[u​u⊥​​][uTu⊥T​​]uuT+u⊥​u⊥T​u⊥​u⊥T​=I−uuT​(42)

根据 B ^ \hat{B} B^的定义和式(41)可知
B = W − 1 / 2 B ^ W − 1 / 2 = W − 1 / 2 u u T W − 1 / 2 + ( I − W − 1 / 2 u u T W 1 / 2 ) B k ( I − W 1 / 2 u u T W − 1 / 2 ) (43) \begin{aligned} B=&W^{-1/2}\hat{B}W^{-1/2}\\ =&W^{-1/2}uu^{\mathrm{T}}W^{-1/2}+(I-W^{-1/2}uu^{\mathrm{T}}W^{1/2})B_k(I-W^{1/2}uu^{\mathrm{T}}W^{-1/2}) \end{aligned}\tag{43} B==​W−1/2B^W−1/2W−1/2uuTW−1/2+(I−W−1/2uuTW1/2)Bk​(I−W1/2uuTW−1/2)​(43)

考虑到
∥ s ^ k ∥ 2 = s ^ k T s ^ k = y ^ k T s ^ k = y k T s k ⇓ u u T = s ^ k s ^ k T ∥ s ^ k ∥ 2 = y ^ k y ^ k T y k T s k = W 1 / 2 y k y k T y k T s k W 1 / 2 ⇓ { W − 1 / 2 u u T W − 1 / 2 = y k y k T y k T s k W − 1 / 2 u u T W 1 / 2 = ( W − 1 / 2 u u T W − 1 / 2 ) W = y k y k T y k T s k W = y k s k T y k T s k W 1 / 2 u u T W − 1 / 2 = W ( W − 1 / 2 u u T W − 1 / 2 ) = W y k y k T y k T s k = s k y k T y k T s k (44) \left\| \hat{s}_k \right\| ^2=\hat{s}_{k}^{\mathrm{T}}\hat{s}_k=\hat{y}_{k}^{\mathrm{T}}\hat{s}_k=y_{k}^{\mathrm{T}}s_k\\\Downarrow \\uu^{\mathrm{T}}=\frac{\hat{s}_k\hat{s}_{k}^{\mathrm{T}}}{\left\| \hat{s}_k \right\| ^2}=\frac{\hat{y}_k\hat{y}_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}=W^{1/2}\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}W^{1/2}\\\Downarrow \\\left\{ \begin{array}{c} W^{-1/2}uu^{\mathrm{T}}W^{-1/2}=\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}\\ W^{-1/2}uu^{\mathrm{T}}W^{1/2}=\left( W^{-1/2}uu^{\mathrm{T}}W^{-1/2} \right) W=\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}W=\frac{y_ks_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}\\ W^{1/2}uu^{\mathrm{T}}W^{-1/2}=W\left( W^{-1/2}uu^{\mathrm{T}}W^{-1/2} \right) =W\frac{y_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}=\frac{s_ky_{k}^{\mathrm{T}}}{y_{k}^{\mathrm{T}}s_k}\\\end{array} \right.\tag{44} ∥s^k​∥2=s^kT​s^k​=y^​kT​s^k​=ykT​sk​⇓uuT=∥s^k​∥2s^k​s^kT​​=ykT​sk​y^​k​y^​kT​​=W1/2ykT​sk​yk​ykT​​W1/2⇓⎩⎪⎪⎪⎨⎪⎪⎪⎧​W−1/2uuTW−1/2=ykT​sk​yk​ykT​​W−1/2uuTW1/2=(W−1/2uuTW−1/2)W=ykT​sk​yk​ykT​​W=ykT​sk​yk​skT​​W1/2uuTW−1/2=W(W−1/2uuTW−1/2)=WykT​sk​yk​ykT​​=ykT​sk​sk​ykT​​​(44)

将式(44)代入式(43)即可得到DFP校正(33)。

上一篇:数据预处理方法总结


下一篇:机器视觉——案例分析基础(五)(图像的噪声产生与去噪)