Rosen的梯度投影法和既约梯度法
目录基本概念
投影矩阵
设 \(\mathbf{M}\) 是 \(m \times n(m \leq n)\) 行满秩矩阵, \(\mathbf{M}^{T}\) 的 \(m\) 个列向量生成的子空间记作 \(V_{\mathbf{M}}, V_{\mathbf{M}}\) 的正交子空间用 \(V_{\mathbf{M}}^{\perp}\)表示,则 \(V_{\mathrm{M}}^{\perp}=\{\mathbf{x} \mid \mathbf{M} \mathbf{x}=\mathbf{0}\}\) 是 \(\mathbf{M}\) 的零空间。对于任意 \(\mathbf{x} \in \mathbb{R}^{n},\) 可惟一分解为
\[\mathbf{x}=\mathbf{p}+\mathbf{q}, \mathbf{p} \in V, \mathbf{q} \in V_{\mathbf{M}}^{\perp}\tag{1} \]\(\mathbf{p}, \mathbf{q}\) 分别是 \(\mathbf{x}\) 在 \(V_{\mathbf{M}}\) 和 \(V_{\mathbf{M}}^{\perp}\) 上的投影,记
\[\mathbf{p}=\mathbf{P} \mathbf{x}, \mathbf{q}=\mathbf{Q} \mathbf{x}\tag{2} \]则 \(\mathbf{P}, \mathbf{Q}\) 分别是 \(\mathbb{R}^{n}\) 到 \(V_{\mathbf{M}}\) 和 \(\mathbb{R}^{n}\) 到 \(V_{\mathbf{M}}^{\perp}\) 上的投影矩阵。
\[\mathbf{R}^{N}=\mathbf{P} \oplus \mathbf{Q} \]因 \(\mathbf{p}\) 可表示为 \(\mathbf{M}^{T}\) 的 \(m\) 个列向量的线性组合,即存在系数向量 \(\mathbf{\eta} \in \mathbb{R}^{n},\) 使
\[\mathbf{p}=\mathbf{M}^{T} \mathbf{\eta}\tag{3} \]将(3)式代入(1)式,然后两端左乘 \(\mathbf{M},\) 并注意到 \(\mathbf{M} \mathbf{p}=\mathbf{0},\) 则
\[\mathbf{M x}=\mathbf{M} \mathbf{M}^{T} \mathbf{\eta} \]因 \(\mathbf{M}\) 行满秩,有
\[\mathbf{\eta}=\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M} \mathbf{x} \]将其代入(3)式得
\[\mathbf{p}=\mathbf{M}^{T}\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M x} \]将上式代入(1)式得
\[\mathbf{q}=\left[\mathbf{I}-\mathbf{M}^{T}\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M}\right] \mathbf{x} \]再根据式(2)得
\[\begin{aligned} \mathbf{P} &=\mathbf{M}^{T}\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M} \\ \mathbf{Q} &=\mathbf{I}-\mathbf{M}^{T}\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M} \end{aligned} \]容易验证,投影矩阵 P,Q是幂等的对称矩阵。反之,一个幂等的对称矩阵必是一个投影矩阵,这就
是下面的定理。
\(\large\color{magenta}{\boxed{\color{brown}{定义} }}\)假设 \(P \in R^{n \times n},\) 如果 \(P^{T}=P, P P=P,\) 则 \(P\) 是投影矩阵
\(\large\color{magenta}{\boxed{\color{brown}{引理1} }}\)如果 \(P \in R^{n \times n}\) 是投影矩阵,则
(1)则 \(P\) 是半正定矩阵 ;
\[x^{T} P x=x^{T} P^{2} x=x^{T} P^{T} P x=(P x)^{T}(P x) \geq 0 \](2) 当且仅当\(I-P\)是投影矩阵,其中\(I\)是 \(n\) 阶的单位矩阵 ;
\[\begin{array}{l} (I-P)^{T}=I^{T}-P^{T}=I-P \\ (I-P)^{2}=I-2 P+P^{2}=I-2 P+P=I-P \end{array} \](3) 令 \(Q=I-P\), \(\forall x \in R^{n}\), 则 \(L=\left\{P x: x \in R^{n}\right\}\) 与 \(L^{\perp}=\left\{Q x: x \in R^{n}\right\}\) 是相互
正交的线性子空间,并且 \(\forall x \in R^{n}\) 可唯一的表示为
梯度投影法
考虑线性约束的非线性问题
\[\begin{aligned} (P) ~ ~ ~ \min &~ ~ ~ f(x)\\\text{s.t.} &~ ~ ~ Ax \geq b\\ &~ ~ ~ Ex = e\\ &~ ~ ~x ∈ R^n \end{aligned}\tag{4} \]\(\begin{aligned} &A \in R^{m \times n}, E \in R^{I \times n}, b \in R^{m}, e \in R^{I} \end{aligned}\)
\(\mathrm{f}: \mathrm{R}^{\mathrm{n}} \rightarrow \mathrm{R}^{1}\)有一阶偏导数 .记(4)的可行域为\(S\).
基本思想
首先,要求迭代点都是在容许集内;
其次,在每一迭代点处为了产生一个下降容许方向,希望能使用目标函数在该点的负梯度方向来生成。
具体来说:若有一容许点x,在该点处的最速下降方向就是一个容许方向的话,那干脆直接取该方向作为x点处的寻优方向;但当迭代点在可行域边界上时 ,若它的最速下降方向不是容许方向的话,则一个直观的想法就是把这个方向投影到可行域内,将它\(\large\color{#70f3ff}{\boxed{\color{green}{改造}}}\)成一个容许方向!
下降容许方向的确定
在容许点 \(\mathrm{x}^{\mathrm{k}}\) 处将A, b分解, 使得\(A_1\mathrm{x}^{\mathrm{k}}=\mathrm{b}_1, \mathrm{A}_{2} \mathrm{x}^{\mathrm{k}}>\mathrm{b}_2\) .
\(d\neq 0\) 在 \(x^{k}\) 处的可行方向当且仅当 \(d\) 满足条件
\[A_1d≥0~~,~~ Ed =0\tag{5} \]记 \(N=\left(\begin{array}{l}A_1 \\ E\end{array}\right), \quad L_{N}=\left\{d \in R^{n} \mid N d=0\right\}\) 为 \(N\) 的零空间,由(5)知,当\(d \in L_{N}\) \ \(\{ 0 \}\)时,d 是 在 \(\boldsymbol{x}^{k}\) 处的可行方向。由假设条件知 \(\boldsymbol{x}^{k}\) 是 \(S\) 的正则点,因此 \(N\) 行满秩, \(Q=I-N^{\mathrm{T}}\left(\mathrm{NN}^{\mathrm{T}}\right)^{-1} \mathrm{~N},\) 为 \(N\) 的投影
矩阵,我们通过投影矩阵 \(Q\) 将 \(-\nabla f\left(x^{k}\right)\) 投影到 \(L_{N}\) 上由此构造可行下降方向。
记 \(N=\left(\begin{array}{l}A_1 \\ E\end{array}\right)\) 不妨设N行满秩 \((\) 否则可直接去掉多余行).设N有r行, \(\quad\)
\(\large\color{magenta}{\boxed{\color{brown}{定理} }}\) \(Q=I-N^{\mathrm{T}}\left(\mathrm{NN}^{\mathrm{T}}\right)^{-1} \mathrm{~N},\) 若 \(Q \nabla \mathrm{f}\left(\mathrm{x}^{(\mathrm{k})}\right) \neq 0,\)则 \(\mathrm{d}=-Q \nabla \mathrm{f}\left(\mathrm{x}^{(\mathrm{k})}\right)\) 是下降容许方向。
【证明】
易验证 \(\mathrm{Q}^{\mathrm{T}} \mathrm{Q}=\left[\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}\right]^{\mathrm{T}}\left[\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}\right]\)
\[\begin{aligned} &=\mathrm{I}-2 \mathrm{~N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}+\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \cdot \mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \\ &=\mathrm{I}-2 \mathrm{~N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}+\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \\ &=\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}=\mathrm{Q} \end{aligned} \]下降 \(: \quad \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)^{\mathrm{T}} \mathrm{d}=-\nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)^{\mathrm{T}} \mathrm{Q} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)=-{\| \mathrm{Q} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \mathrm{\|}_{2}}^{2}<0\) 。
可行: 只要验证 \(A_1d\geq 0, \mathrm{Ed}=0,\)
\[\begin{aligned} \mathrm{Nd} &=-\mathrm{NQ} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \\ &=-\mathrm{N}\left[\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}\right] \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \\ &=-[\mathrm{N}-\mathrm{N}] \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)=0 \end{aligned} \]\(Q \nabla \mathbf{f}\left(\mathbf{x}^{\mathbf{k}}\right)=0\) 的情况
\[\begin{aligned} \mathrm{Q} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) &=\left[\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}\right] \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \\ &=\nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \\ &=\nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)-\mathrm{N}^{\mathrm{T}} \mathrm{q}=0 \quad \end{aligned} \]记 \(\mathrm{q}=\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)\) ,对应于\(N\)可将\(q\)也加以分解
\[\boldsymbol{q}=\left(\begin{array}{l} \boldsymbol{y} \\ z \end{array}\right)\left(\boldsymbol{y} \text { 对应于 } \boldsymbol{A}_1, z\text { 对应于 } \boldsymbol{E}\right) \]这样上式就成为 \(\quad \nabla f\left(x^{(k)}\right)-\left(\begin{array}{c}A_1\\ E\end{array}\right)^{T}\left(\begin{array}{l}y \\ z\end{array}\right)=0\)
\[\text { 即 } \nabla f\left(x^{(k)}\right)-A_1^{ T} y-E^{T} z=0 \]Kuhn-Tucker定理
混合约束问题
\[\begin{align} P : minimize \ \ &f(x) \\ \mathrm{subject\,\, to}\qquad &g_i(x) \leq 0 \quad i=1, \cdots, m \\ &h_i(x) = 0 \quad i=1, \cdots, k\\ & x \in R^n \end{align} \tag{1} \]设\(x^*\)为混合约束问题的一个极小点, \(\mathrm{I}=\left\{\mathrm{i|g}_{\mathrm{i}}\left(\mathrm{x}^{*}\right)=0, \mathrm{i}=1: \mathrm{m}\right\}\) .函数 \(\mathrm{f}(\mathrm{x}), \mathrm{g}_{1}(\mathrm{x}) \ldots, \mathrm{g}_{\mathrm{m}}(\mathrm{x}), \mathrm{h}_{1}(\mathrm{x}), \ldots, \mathrm{h}_{k}(\mathrm{x})\) 在 \(\mathrm{x} ^*\) 处都有一阶偏导数,当 \(\nabla \mathrm{h}_{1}\left(\mathrm{x}^{*}\right), \ldots, \nabla \mathrm{h}_{k}\left(\mathrm{x}^{*}\right), \nabla \mathrm{g}_{\mathrm{i}}\left(\mathrm{x}^{*}\right)(\mathrm{i} \in \mathrm{I})\) 线性无关, \(,\) 则存在实数 \(\mu_{1}, \ldots, \mu_{\mathrm{m}}, \lambda_{1}, \ldots, \lambda_{l}\) 使得
\[\begin{array}{l} \nabla \mathrm{f}\left(\mathrm{x}^{*}\right)+\left[\mu_{1} \nabla \mathrm{g}_{1}\left(\mathrm{x}^{*}\right)+\mu_{2} \nabla \mathrm{g}_{2}\left(\mathrm{x}^{*}\right)+\ldots+\mu_{\mathrm{m}} \nabla \mathrm{g}_{\mathrm{m}}\left(\mathrm{x}^{*}\right)\right] \\ +\left[\lambda_{1} \nabla \mathrm{h}_{1}\left(\mathrm{x}^{*}\right)+\lambda_{2} \nabla \mathrm{h}_{2}\left(\mathrm{x}^{*}\right)+\ldots+\lambda_{k} \nabla \mathrm{h}_{l}\left(\mathrm{x}^{*}\right)\right]=0 \\ \end{array} \]\(\mu_{\mathrm{i}} \mathrm{g}_{\mathrm{i}}\left(\mathrm{x}^{*}\right)=0, \mathrm{i}=1: \mathrm{m},\mu_{i} \geq 0, i=1: m\)\(.\left(\right.\) 即 \(\lambda_{\mathrm{j}}\) 可正可负, \(\quad\) 但 \(\mu_{\mathrm{i}}\) 必非负 \()\)
\(\large\color{magenta}{\boxed{\color{brown}{定理} }}\) 从而, 若y \(\geqslant 0\) 的话, 则上式就是K-T条件, 从而知 \(\mathrm{X}^{(\mathrm{k})}\) 为KKT,点。
但若 \(\mathrm{y} \leqslant 0\), 即y中有负分量比如\(y _{\mathrm{j0}}\) (可能会有多个, 任取一个) ,\(y _{\mathrm{j0}}=min\{y_j\}<0\),这时将\(A_1\) 中与 \(y _{\mathrm{j0}}\) 相对应的那个行整个删去,仍记删去该行后的矩阵为N,用这个新的矩阵N构造Q,从而构造d,这时这个必为下降容许方向 。
直线搜索确定步长\(\alpha_{max}\)
\[\begin{aligned} (P\alpha_{max}) ~ ~ ~ \min &~ ~ ~ f(x^{k}+\alpha d^k)\\\text{s.t.} &~ ~ ~ 0\leq \alpha \leq \alpha_{max} \end{aligned} \]其中
\[\alpha_{max} = \left\{\begin{matrix} min\left\{ \frac{\hat{b_i}}{\hat{d_i}}, \hat{d_i}<0\right\} ,&当 \hat{d}\ngeqslant0\\ +\infty ,&当\hat{d}≥0 \end{matrix}\right. \]设得\(\bar\alpha\), 则得新的迭代点 \(\mathrm{x}^{(\mathrm{k}+1)}=\mathrm{x}^{(\mathrm{k})}+\mathrm{\bar\alpha} \mathrm{~d}\)
投影梯度算法步骤:
-
取初始可行,点 \(\mathrm{x}^{0}, \quad\) 精度 \(\epsilon>0\), 置 \(\mathrm{k}=0\)
-
在x \(^{\mathrm{k}}\) 处将A, b分解
-
构造\(N\),从而构造\(Q\)(若\(N\)空的话,就取\(Q=I\))
-
计算 \(d^{k}=-Q \nabla f\left(x^{k}\right)\)
-
若 \(\mathrm{d}^{\mathrm{k}}=0,\) 或者\(\mathrm{d}^{\mathrm{k}}\leq \epsilon\) , 则 当\(N\)空时停止, 当\(N\)非空时计算 \(\mathrm{q}=\left(\mathrm{NN}^{\mathrm{T}}\right)^{-1} \mathrm{~N} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)\) 并相应分解为两块y \(, \mathrm{z}\) 。若 \(\mathrm{y} \geqslant 0\),停止迭代。否则, 修正\(A_1\) , 将\(A_1\) 中与 \(y _{\mathrm{j0}}\) 相对应的那个行整个删去,转 3.
-
若 \(\mathrm{d}^{(\mathrm{k})} \neq 0,\) 则确定步长,求解
\[\begin{aligned} (P\alpha_{max}) ~ ~ ~ \min &~ ~ ~ f(x^{k}+\alpha d^k)\\\text{s.t.} &~ ~ ~ 0\leq \alpha \leq \alpha_{max} \end{aligned} \]
设得\(\alpha _k\), 则得新的迭代点 \(\mathrm{x}^{(\mathrm{k}+1)}=\mathrm{x}^{(\mathrm{k})}+\mathrm{\alpha}_k \mathrm{~d}\),置 \(\mathrm{k}=\mathrm{k}+1,\) 转2
例子
\[\begin{array}{l} \min \quad &f(x)=x_{1}^{2}+4 x_{2}^{2} \\ \text { s. } t . \quad &x_{1}+x_{2} \geq 1 \\ &15 x_{1}+10 x_{2} \geq 12 \\ &x_{1} \quad \geqslant 0 \\ &x_{2} \geq 0 \end{array} \]取初始容许点 \(\mathrm{x}^{0}=(0,2)^{\mathrm{T}}\)
【解】
\[\nabla f(x)=\left(\begin{array}{l} 2 x_{1} \\ 8 x_{2} \end{array}\right), A=\left(\begin{array}{cc} 1 & 1 \\ 15 & 10 \\ 1 & 0 \\ 0 & 1 \end{array}\right), b=\left(\begin{array}{l} 1 \\ 12 \\ 0 \\ 0 \end{array}\right) \]第一次迭代:\(x^{0} \rightarrow x^{1}\)
(1) 下降容许方向的确定
要先生成寻优方向\(d ^{0},\) 先求解投影矩阵 \(Q\) ,在 \(\mathrm{x}^{0}\) 处 :
\[\mathrm{A}_1=(1 \quad 0), \mathrm{b}_1=(0) \quad \boldsymbol{A}_2=\left(\begin{array}{ll} 1 & 1 \\ 15 & 10 \\ 0 & 1 \end{array}\right), \boldsymbol{b}_2=\left(\begin{array}{l} 1 \\ 12 \\ 0 \end{array}\right) \]从而\(N=(1,0)\)
\[\begin{array}{l} Q=I-N^{T}\left(N N^{T}\right)^{-1} N=\left(\begin{array}{ll} 0 & 0 \\ 0 & 1 \end{array}\right) \\ d^{0}=-Q \nabla f\left(x^{0}\right)=\left(\begin{array}{l} 0 \\ -16 \end{array}\right) \neq 0 \end{array} \](2)精确线搜索:
计算\(\hat{b} =b_2-A_2x^{0} = \begin{bmatrix} 1 \\ 12 \\ 0 \end{bmatrix}- \begin{bmatrix} 1 & 1 \\ 15 & 10 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} 0\\2 \end{bmatrix} =\begin{bmatrix} -1\\-8\\-2 \end{bmatrix} ,~~ \hat{d}= A_2 d^0=\begin{bmatrix} 1 & 1 \\ 15 & 10 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} 0\\-16 \end{bmatrix} =\begin{bmatrix} -16\\-160\\-16 \end{bmatrix}\)
由于\(\hat{d}\ngeqslant0\),计算
\[\alpha_{max} = min\left\{ \frac{{-1}}{{-16}},\frac{{-8}}{{-160}},\frac{{-2}}{{-16}}\right\} =\frac{{1}}{{20}} \]而\(x^{0}+\alpha d^0=\begin{pmatrix}0 \\ 2 \end{pmatrix} +\alpha \begin{pmatrix}0 \\ -16 \end{pmatrix} =\begin{pmatrix}0 \\2-16\alpha \end{pmatrix}\)
\[\begin{aligned} (P\alpha_{max}) ~ ~ ~ \min &~ ~ ~ f(x^{0}+\alpha d^0)=4(2-16\alpha )^2\\\text{s.t.} &~ ~ ~ 0\leq \alpha \leq \frac{{1}}{{20}} \end{aligned} \]令\(f(x^{0}+\alpha d^0)'=0, 0\leq \alpha \leq \frac{{1}}{{20}}\)解得 \(\alpha_0= \frac{{1}}{{20}}\).
则 $x{1}=x{0}+\alpha_1d^0=\begin{pmatrix}0
\ 1.2 \end{pmatrix} $
第二次迭代: \(x^{1} \rightarrow x^{2}\)
(1) 下降容许方向的确定
在 \(\mathrm{x}^{1}\) 处2, 3 约束有效 ,则
\[\quad A_1=\left(\begin{array}{cc}15 & 10 \\ 1 & 0\end{array}\right), \boldsymbol{b}_1=\left(\begin{array}{l}12 \\ 0\end{array}\right) {A}_2=\left(\begin{array}{ll} 1 & 1 \\ 0 & 1 \end{array}\right), \boldsymbol{b}_2=\left(\begin{array}{l} 1 \\ 0 \end{array}\right) \]从而
\[\begin{array}{} N=\left(\begin{array}{ll}15 & 10 \\ 1 & 0\end{array}\right) \\ Q=I-N^{T}\left(N N^{T}\right)^{-1} N \\ = \left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right) -\left(\begin{array}{ll}15 & 1 \\ 10 & 0\end{array}\right) \left [ \left(\begin{array}{ll}15 & 10 \\ 1 & 0\end{array}\right) \left(\begin{array}{ll}15 & 1 \\ 10 & 0\end{array}\right) \right ]^{-1} \left(\begin{array}{ll}15 & 10 \\ 1 & 0\end{array}\right)=0 \\ \text { 从而 } d=-Q \nabla f\left(x^{(1)}\right)=0 \end{array} \]计算
\[\boldsymbol{q}=\left(\boldsymbol{N} \boldsymbol{N}^{\boldsymbol{T}}\right)^{-1} \boldsymbol{N} \nabla \boldsymbol{f}\left(\boldsymbol{x}^{(1)}\right)=\left(\begin{array}{l} 0.96 \\ -14.4 \end{array}\right) \]其中第二个分量小于0,故删去\(A_1\)中对应的行,从而有新的 \(\mathrm{A}_1=(15 \quad 10),\) 故有新的 \(\mathrm{N}=(15 \quad 10)\)
故有新的 \(Q=\boldsymbol{I}-\boldsymbol{N}^{T}\left(\boldsymbol{N N}^{T}\right)^{-1} \boldsymbol{N}\)
\(=\left(\begin{array}{cc}1 & 0 \\ 0 & 1\end{array}\right)-\left(\begin{array}{c}15 \\ 10\end{array}\right)\left[(15 \quad 10)\left(\begin{array}{c}15 \\ 10\end{array}\right)\right]^{-1}(15 \quad 10)\)
\(=\left(\begin{array}{cc}0.3046 & -0.4635 \\ -0.4635 & 0.6923\end{array}\right)\)
从而有寻优方向 \(d^{1}=-Q \nabla f\left(x^{1}\right)=\left(\begin{array}{l}4.4307 \\ -6.6462\end{array}\right)\)
\(\mathrm{x}^{1}\) 沿着该方向作线性搜索可得新的点 \(\mathrm{x}^{2}=(0.4,0.6)^{\mathrm{T}}\)
既约梯度法
既约梯度法首先由Wolfe提出,解决具有线性约束的非线性规划问题。1969年由Abadie与Carpentier推广至非线性约束;
基本思想:借鉴求解线性规划的单纯形算法,选择某些变量为基变量,其它的作为非基变量,将基变量用非基变量表示,并从目标函数中消去基变量,得到以非基变量为自变量的简化的目标函数,进而利用此函数的负梯度构造下降可行方向。
既约梯度:简化目标函数关于非基变量的梯度。
考虑以下问题
\[\begin{aligned} &\min f(x)\\ &\begin{array}{l} \text { s.t. } \quad A x=b, \quad x \geq 0 \\ \end{array} \end{aligned} \]其中\(A_{m \times n}, r(A)=m, b_{m \times 1} , m \leq n\)
思想:降维
\[\begin{array}{ll} \min &f\left(x_{B}, x_{N}\right) \\ \text { s.t. } &B x_{B}+N x_{N}=b\\ \quad& x_{B}, x_{N} \geq 0 \end{array} ~~~~~~~~~~ \Leftrightarrow ~~~~~~~~~~ \begin{array}{} \min &F\left(x_{N}\right)\\ \text { s.t. }& x_{B}, x_{N} \geq 0 \end{array} \]B可逆方阵
\[A=(B ~~~N), \quad x=\left(\begin{array}{l} x_{B} \\ x_{N} \end{array}\right) \quad \begin{array}{l} x_{B}=B^{-1} b-B^{-1} N x_{N} \\ \Rightarrow F\left(x_{N}\right)=f\left(x_{B}\left(x_{N}\right), x_{N}\right) \end{array} \]\(f(x)\) 的既约梯度为
\[\begin{array}{l} r\left(x_{N}\right)=\nabla F\left(x_{N}\right) \\ =\nabla_{x_{N}} f(x)-\left(B^{-1} N\right)^{T} \nabla_{x_{B}} f(x) \end{array} \]确定下降可行方向 \(d^{(k)}\)
- 下降方向\(d^{(k)}\) :\(∇f(x)^Td^{(k)}< 0\)
- 可行方向 \(d^{(k)}\): \(,Ad=0,d_j\leq 0\) ,如果\(x_j = 0\)
\(d^{(k)}=\left(\begin{array}{l}d_{B}^{(k)} \\ d_{N}^{(k)}\end{array}\right), \quad d_{B}^{(k)}\) 和 \(d_{N}^{(k)}\) 分别对应基变量和非基变量。
定义 \(d_{N}^{(k)},\) 使得
\[\begin{aligned} d_{N_{j}}^{(k)} &=\left\{\begin{array}{ll} -x_{N_{j}}^{(k)} r_{j}\left(x_{N}^{(k)}\right) & \text { 当 } r_{j}\left(x_{N}^{(k)}\right)>0 \\ -r_{j}\left(x_{N}^{(k)}\right) & \text { 当 } r_{j}\left(x_{N}^{(k)}\right) \leq 0 \end{array}\right. \end{aligned} \]为得到可行方向, \(应 有 A d^{(k)}=0,\) 即 \(\quad B d_{B}^{(k)}+N d_{N}^{(k)}=0\)
\[\text { 取 } \quad d_{B}^{(k)}=-B^{-1} N d_{N}^{(k)} \Rightarrow \quad d^{(k)}=\left(\begin{array}{c} -B^{-1} N d_{N}^{(k)} \\ d_{N}^{(k)} \end{array}\right) \]\(\large\color{magenta}{\boxed{\color{brown}{定理} }}\) 设 \(x\) 是可行解, \(A=(B, N)\) 是 \(m \times n\) 矩阵, \(B\) 为 \(m\) 阶可逆矩阵, \(x=\left(x_{B}^{T}, x_{N}^{T}\right)^{T}, x_{B}>0,\) 函数 \(f\)
在点x处可微,又设
其中
\[d_{N_{j}}=\left\{\begin{array}{ll} -x_{N_{j}} r_{j}\left(x_{N}\right) & r_{j}\left(x_{N}\right)>0 \\ -r_{j}\left(x_{N}\right) & r_{j}\left(x_{N}\right) \leq 0 \end{array}\right. \]如果 \(d \neq 0,\) 则 \(d\) 是下降可行方向, 而且 \(d=0\) 的充要条件是x为KKT点。
直线搜索确定步长\(\alpha_{max}\)
\[\begin{array}{l} x^{k} \in S, A x^{k}=b, x_{B}^{k}>0, x_{N}^{k} \geq 0, A d^{k}=0 \\ x^{k}+\alpha d^{k} \in S \\ x^{k+1} \geq 0, x_{j}^{k+1}=x_{j}^{k}+\alpha d_{j}^{k} \geq 0, j=1, \cdots, n \end{array} \](1) 如果 \(d_{j}^{k} \geq 0,\) 则 \(x_{j}^{k+1} \geq 0, \forall \alpha>0\).
(2) 如果 \(d_{j}^{k}<0,\) 则 \(\alpha \leq \frac{x_{j}^{k}}{-d_{j}^{k}}\).
让 \(\alpha_{\max }=\left\{\begin{array}{ll}\infty, & d^{k} \geq 0, \\ \min \left\{\frac{x_{j}^{k}}{-d_{j}^{k}} \mid d_{j}^{k}<0\right\}, & \text { else. }\end{array}\right.\)
于是精确线搜索为:
\[\begin{aligned} (P\alpha_{max}) ~ ~ ~ \min &~ ~ ~ f(x^{k}+\alpha d^k)\\\text{s.t.} &~ ~ ~ 0\leq \alpha \leq \alpha_{max} \end{aligned}\tag 7 \]例子
\[\begin{array}{l} \min \quad f(x)=2 x_{1}^{2}+x_{2}^{2} \\ \text { s.t. } \quad x_{1}-x_{2}+x_{3} \quad=2 \\ -2 x_{1}+x_{2}+x_{4}=1 \\ x_{1}, x_{2}, x_{3}, x_{4} \geq 0 \end{array} \]解:
\[\nabla f(x)=\left(\begin{array}{c} 4 x_{1} \\ 2 x_{2} \\ 0 \\ 0 \end{array}\right), A=\left(\begin{array}{rrrr} 1 & -1 & 1 & 0 \\ -2 & 1 & 0 & 1 \end{array}\right), b=\left(\begin{array}{l} 2 \\ 1 \end{array}\right) \]第一次迭代:
(1) \(x^{1}=\left(\begin{array}{c}1 \\ 3 \\ 4 \\ 0\end{array}\right)\) \(\in\) \(S\),
\(\nabla f\left(x^{1}\right)=\left(\begin{array}{c}4 \\ 6 \\ 0 \\ 0\end{array}\right)\)
\(x_{B}=\left(\begin{array}{c}x_{2} \\ x_{3}\end{array}\right)=\left(\begin{array}{l}3 \\ 4\end{array}\right), x_{N}=\left(\begin{array}{c}x_{1} \\ x_{4}\end{array}\right)=\left(\begin{array}{l}1 \\ 0\end{array}\right)\)
\(B=\left(\begin{array}{rr}-1 & 1 \\ 1 & 0\end{array}\right), B^{-1}=\left(\begin{array}{ll}0 & 1 \\ 1 & 1\end{array}\right), N=\left(\begin{array}{rr}1 & 0 \\ -2 & 1\end{array}\right)\)
直线搜索确定步长\(\alpha_{max}\)
\[\begin{array}{ll} \min & f\left(x^{1}+\alpha d^{1}\right)=2(1-16 \alpha)^{2}+(3-38 \alpha)^{2} \\ \text { s.t. } & 0 \leq \alpha \leq \frac{1}{16} \end{array} \]得到 \(\alpha_{1}=\frac{1}{16}\)
第二次迭代:
\(x^{2}=\left(\begin{array}{c}0 \\ \frac{5}{8} \\ \frac{21}{8} \\ \frac{3}{8}\end{array}\right), \nabla f\left(x^{2}\right)=\left(\begin{array}{c}0 \\ \frac{5}{4} \\ 0 \\ 0\end{array}\right)\)
\(x_{B}=\left(\begin{array}{c}x_{2} \\ x_{3}\end{array}\right)=\left(\begin{array}{r}\frac{5}{8} \\ \frac{21}{8}\end{array}\right), x_{N}=\left(\begin{array}{c}x_{1} \\ x_{4}\end{array}\right)=\left(\begin{array}{c}0 \\ \frac{3}{8}\end{array}\right)\)
\(B=\left(\begin{array}{rr}-1 & 1 \\ 1 & 0\end{array}\right), B^{-1}=\left(\begin{array}{ll}0 & 1 \\ 1 & 1\end{array}\right), N=\left(\begin{array}{rr}1 & 0 \\ -2 & 1\end{array}\right)\)
\(\begin{aligned}\left(r_{N}^{2}\right)^{T} &=\nabla_{N} f\left(x^{2}\right)^{T}-\nabla_{B} f\left(x^{2}\right)^{T} B^{-1} N \\ &=\left(\begin{array}{l}0 \\ 0\end{array}\right)^{T}-\left(\begin{array}{c}\frac{5}{4} \\ 0\end{array}\right)^{T}\left(\begin{array}{ll}0 & 1 \\ 1 & 1\end{array}\right)\left(\begin{array}{rr}1 & 0 \\ -2 & 1\end{array}\right) \\ &=\left(\begin{array}{r}\frac{5}{2} \\ -\frac{5}{4}\end{array}\right)^{T} \end{aligned}\)
\(\begin{aligned} d_{N}^{2} &=\left(\begin{array}{c}d_{1}^{2} \\ d_{4}^{2}\end{array}\right)=\left(\begin{array}{c}0 \\ \frac{5}{4}\end{array}\right) \\ d_{B}^{2} &=\left(\begin{array}{c}d_{2}^{2} \\ d_{3}^{2}\end{array}\right)=-\left(\begin{array}{cc}0 & 1 \\ 1 & 1\end{array}\right)\left(\begin{array}{rc}1 & 0 \\ -2 & 1\end{array}\right)\left(\begin{array}{c}0 \\ \frac{5}{4}\end{array}\right) \\ &=\left(\begin{array}{r}-\frac{5}{4} \\ \frac{5}{4}\end{array}\right) \\ d^{2} &=\left(\begin{array}{r}0 \\ -\frac{5}{4} \\ -\frac{5}{4} \\ -\frac{5}{4}\end{array}\right) \end{aligned}\)
\(\alpha_{\max }=\min \left\{\frac{5 / 8}{5 / 4}, \frac{21 / 8}{5 / 4}\right\}=\frac{1}{2}\)
\(x^{2}+\alpha d^{2}=\left(\begin{array}{c}0 \\ \frac{5}{8} \\ \frac{21}{8} \\ \frac{3}{8}\end{array}\right)+\alpha\left(\begin{array}{r}0 \\ -\frac{5}{4} \\ -\frac{5}{4} \\ -\frac{5}{4}\end{array}\right)=\left(\begin{array}{c}\frac{5}{8}-\frac{5}{4} \alpha \\ \frac{21}{8}-\frac{5}{4} \alpha \\ \frac{3}{8}+\frac{5}{4} \alpha\end{array}\right)\)
\(f\left(x^{2}+\alpha d^{2}\right)=\left(\frac{5}{8}-\frac{5}{4} \alpha\right)^{2}\)
直线搜索确定步长\(\alpha_{max}\)
\[\begin{array}{ll} \min & f\left(x^{2}+\alpha d^{2}\right)=\left(\frac{5}{8}-\frac{5}{4} \alpha\right)^{2} \\ \text { s.t. } & 0 \leq \alpha \leq \frac{1}{2} \end{array} \]\(\alpha_{2}=\frac{1}{2}\).
第三次迭代:
\(x^{3}=\left(\begin{array}{l}0 \\ 0 \\ 2 \\ 1\end{array}\right), \nabla f\left(x^{3}\right)=\left(\begin{array}{l}0 \\ 0 \\ 0 \\ 0\end{array}\right), I_{B}=\{3,4\}, I_{N}=\{1,2\}\)
\(x_{B}=\left(\begin{array}{c}x_{3} \\ x_{4}\end{array}\right)=\left(\begin{array}{c}2 \\ 1\end{array}\right), x_{N}=\left(\begin{array}{c}x_{1} \\ x_{2}\end{array}\right)=\left(\begin{array}{l}0 \\ 0\end{array}\right)\)
\(B=\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right), B^{-1}=\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right), N=\left(\begin{array}{rr}1 & -1 \\ -2 & 1\end{array}\right)\)
停止迭代,
\[\bar{x}=x^{3}=\left(\begin{array}{l} 0 \\ 0 \\ 2 \\ 1 \end{array}\right) \]参考资料
MIT凸优化课程
Stephen Boyd的Convex Optimization
《凸优化》,Stephen Boyd等著,王书宁等译
“最优化理论与算法”北京邮电大学数学系