Rosen的梯度投影法和既约梯度法
文章目录
基本概念
投影矩阵
设
M
\mathbf{M}
M 是
m
×
n
(
m
≤
n
)
m \times n(m \leq n)
m×n(m≤n) 行满秩矩阵,
M
T
\mathbf{M}^{T}
MT 的
m
m
m 个列向量生成的子空间记作
V
M
,
V
M
V_{\mathbf{M}}, V_{\mathbf{M}}
VM,VM 的正交子空间用
V
M
⊥
V_{\mathbf{M}}^{\perp}
VM⊥表示,则
V
M
⊥
=
{
x
∣
M
x
=
0
}
V_{\mathrm{M}}^{\perp}=\{\mathbf{x} \mid \mathbf{M} \mathbf{x}=\mathbf{0}\}
VM⊥={x∣Mx=0} 是
M
\mathbf{M}
M 的零空间。对于任意
x
∈
R
n
,
\mathbf{x} \in \mathbb{R}^{n},
x∈Rn, 可惟一分解为
x
=
p
+
q
,
p
∈
V
,
q
∈
V
M
⊥
(1)
\mathbf{x}=\mathbf{p}+\mathbf{q}, \mathbf{p} \in V, \mathbf{q} \in V_{\mathbf{M}}^{\perp}\tag{1}
x=p+q,p∈V,q∈VM⊥(1)
p
,
q
\mathbf{p}, \mathbf{q}
p,q 分别是
x
\mathbf{x}
x 在
V
M
V_{\mathbf{M}}
VM 和
V
M
⊥
V_{\mathbf{M}}^{\perp}
VM⊥ 上的投影,记
p
=
P
x
,
q
=
Q
x
(2)
\mathbf{p}=\mathbf{P} \mathbf{x}, \mathbf{q}=\mathbf{Q} \mathbf{x}\tag{2}
p=Px,q=Qx(2)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-T7KXyep8-1622729384402)(…/…/…/AppData/Roaming/Typora/typora-user-images/image-20201220092346176.png)]
则
P
,
Q
\mathbf{P}, \mathbf{Q}
P,Q 分别是
R
n
\mathbb{R}^{n}
Rn 到
V
M
V_{\mathbf{M}}
VM 和
R
n
\mathbb{R}^{n}
Rn 到
V
M
⊥
V_{\mathbf{M}}^{\perp}
VM⊥ 上的投影矩阵。
R
N
=
P
⊕
Q
\mathbf{R}^{N}=\mathbf{P} \oplus \mathbf{Q}
RN=P⊕Q
因
p
\mathbf{p}
p 可表示为
M
T
\mathbf{M}^{T}
MT 的
m
m
m 个列向量的线性组合,即存在系数向量
η
∈
R
n
,
\mathbf{\eta} \in \mathbb{R}^{n},
η∈Rn, 使
p
=
M
T
η
(3)
\mathbf{p}=\mathbf{M}^{T} \mathbf{\eta}\tag{3}
p=MTη(3)
将(3)式代入(1)式,然后两端左乘
M
,
\mathbf{M},
M, 并注意到
M
p
=
0
,
\mathbf{M} \mathbf{p}=\mathbf{0},
Mp=0, 则
M
x
=
M
M
T
η
\mathbf{M x}=\mathbf{M} \mathbf{M}^{T} \mathbf{\eta}
Mx=MMTη
因
M
\mathbf{M}
M 行满秩,有
η
=
(
M
M
T
)
−
1
M
x
\mathbf{\eta}=\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M} \mathbf{x}
η=(MMT)−1Mx
将其代入(3)式得
p
=
M
T
(
M
M
T
)
−
1
M
x
\mathbf{p}=\mathbf{M}^{T}\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M x}
p=MT(MMT)−1Mx
将上式代入(1)式得
q
=
[
I
−
M
T
(
M
M
T
)
−
1
M
]
x
\mathbf{q}=\left[\mathbf{I}-\mathbf{M}^{T}\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M}\right] \mathbf{x}
q=[I−MT(MMT)−1M]x
再根据式(2)得
P
=
M
T
(
M
M
T
)
−
1
M
Q
=
I
−
M
T
(
M
M
T
)
−
1
M
\begin{aligned} \mathbf{P} &=\mathbf{M}^{T}\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M} \\ \mathbf{Q} &=\mathbf{I}-\mathbf{M}^{T}\left(\mathbf{M} \mathbf{M}^{T}\right)^{-1} \mathbf{M} \end{aligned}
PQ=MT(MMT)−1M=I−MT(MMT)−1M
容易验证,投影矩阵 P,Q是幂等的对称矩阵。反之,一个幂等的对称矩阵必是一个投影矩阵,这就
是下面的定理。
定 义 \large\color{magenta}{\boxed{\color{brown}{定义} }} 定义假设 P ∈ R n × n , P \in R^{n \times n}, P∈Rn×n, 如果 P T = P , P P = P , P^{T}=P, P P=P, PT=P,PP=P, 则 P P P 是投影矩阵
引 理 1 \large\color{magenta}{\boxed{\color{brown}{引理1} }} 引理1如果 P ∈ R n × n P \in R^{n \times n} P∈Rn×n 是投影矩阵,则
(1)则
P
P
P 是半正定矩阵 ;
x
T
P
x
=
x
T
P
2
x
=
x
T
P
T
P
x
=
(
P
x
)
T
(
P
x
)
≥
0
x^{T} P x=x^{T} P^{2} x=x^{T} P^{T} P x=(P x)^{T}(P x) \geq 0
xTPx=xTP2x=xTPTPx=(Px)T(Px)≥0
(2) 当且仅当
I
−
P
I-P
I−P是投影矩阵,其中
I
I
I是
n
n
n 阶的单位矩阵 ;
(
I
−
P
)
T
=
I
T
−
P
T
=
I
−
P
(
I
−
P
)
2
=
I
−
2
P
+
P
2
=
I
−
2
P
+
P
=
I
−
P
\begin{array}{l} (I-P)^{T}=I^{T}-P^{T}=I-P \\ (I-P)^{2}=I-2 P+P^{2}=I-2 P+P=I-P \end{array}
(I−P)T=IT−PT=I−P(I−P)2=I−2P+P2=I−2P+P=I−P
(3) 令
Q
=
I
−
P
Q=I-P
Q=I−P,
∀
x
∈
R
n
\forall x \in R^{n}
∀x∈Rn, 则
L
=
{
P
x
:
x
∈
R
n
}
L=\left\{P x: x \in R^{n}\right\}
L={Px:x∈Rn} 与
L
⊥
=
{
Q
x
:
x
∈
R
n
}
L^{\perp}=\left\{Q x: x \in R^{n}\right\}
L⊥={Qx:x∈Rn} 是相互
正交的线性子空间,并且
∀
x
∈
R
n
\forall x \in R^{n}
∀x∈Rn 可唯一的表示为
x
=
p
+
q
,
p
∈
L
,
q
∈
L
⊥
x=p+q, p \in L, q \in L^{\perp}
x=p+q,p∈L,q∈L⊥
梯度投影法
考虑线性约束的非线性问题
(
P
)
min
f
(
x
)
s.t.
A
x
≥
b
E
x
=
e
x
∈
R
n
(4)
\begin{aligned} (P) ~ ~ ~ \min &~ ~ ~ f(x)\\\text{s.t.} &~ ~ ~ Ax \geq b\\ &~ ~ ~ Ex = e\\ &~ ~ ~x ∈ R^n \end{aligned}\tag{4}
(P) mins.t. f(x) Ax≥b Ex=e x∈Rn(4)
A
∈
R
m
×
n
,
E
∈
R
I
×
n
,
b
∈
R
m
,
e
∈
R
I
\begin{aligned} &A \in R^{m \times n}, E \in R^{I \times n}, b \in R^{m}, e \in R^{I} \end{aligned}
A∈Rm×n,E∈RI×n,b∈Rm,e∈RI
f : R n → R 1 \mathrm{f}: \mathrm{R}^{\mathrm{n}} \rightarrow \mathrm{R}^{1} f:Rn→R1有一阶偏导数 .记(4)的可行域为 S S S.
基本思想
首先,要求迭代点都是在容许集内;
其次,在每一迭代点处为了产生一个下降容许方向,希望能使用目标函数在该点的负梯度方向来生成。
具体来说:若有一容许点x,在该点处的最速下降方向就是一个容许方向的话,那干脆直接取该方向作为x点处的寻优方向;但当迭代点在可行域边界上时 ,若它的最速下降方向不是容许方向的话,则一个直观的想法就是把这个方向投影到可行域内,将它 改 造 \large\color{#70f3ff}{\boxed{\color{green}{改造}}} 改造成一个容许方向!
下降容许方向的确定
在容许点 x k \mathrm{x}^{\mathrm{k}} xk 处将A, b分解, 使得 A 1 x k = b 1 , A 2 x k > b 2 A_1\mathrm{x}^{\mathrm{k}}=\mathrm{b}_1, \mathrm{A}_{2} \mathrm{x}^{\mathrm{k}}>\mathrm{b}_2 A1xk=b1,A2xk>b2 .
d
≠
0
d\neq 0
d=0 在
x
k
x^{k}
xk 处的可行方向当且仅当
d
d
d 满足条件
A
1
d
≥
0
,
E
d
=
0
(5)
A_1d≥0~~,~~ Ed =0\tag{5}
A1d≥0 , Ed=0(5)
记
N
=
(
A
1
E
)
,
L
N
=
{
d
∈
R
n
∣
N
d
=
0
}
N=\left(\begin{array}{l}A_1 \\ E\end{array}\right), \quad L_{N}=\left\{d \in R^{n} \mid N d=0\right\}
N=(A1E),LN={d∈Rn∣Nd=0} 为
N
N
N 的零空间,由(5)知,当
d
∈
L
N
d \in L_{N}
d∈LN \
{
0
}
\{ 0 \}
{0}时,d 是 在
x
k
\boldsymbol{x}^{k}
xk 处的可行方向。由假设条件知
x
k
\boldsymbol{x}^{k}
xk 是
S
S
S 的正则点,因此
N
N
N 行满秩,
Q
=
I
−
N
T
(
N
N
T
)
−
1
N
,
Q=I-N^{\mathrm{T}}\left(\mathrm{NN}^{\mathrm{T}}\right)^{-1} \mathrm{~N},
Q=I−NT(NNT)−1 N, 为
N
N
N 的投影
矩阵,我们通过投影矩阵
Q
Q
Q 将
−
∇
f
(
x
k
)
-\nabla f\left(x^{k}\right)
−∇f(xk) 投影到
L
N
L_{N}
LN 上由此构造可行下降方向。
记 N = ( A 1 E ) N=\left(\begin{array}{l}A_1 \\ E\end{array}\right) N=(A1E) 不妨设N行满秩 ( ( ( 否则可直接去掉多余行).设N有r行, \quad
定 理 \large\color{magenta}{\boxed{\color{brown}{定理} }} 定理 Q = I − N T ( N N T ) − 1 N , Q=I-N^{\mathrm{T}}\left(\mathrm{NN}^{\mathrm{T}}\right)^{-1} \mathrm{~N}, Q=I−NT(NNT)−1 N, 若 Q ∇ f ( x ( k ) ) ≠ 0 , Q \nabla \mathrm{f}\left(\mathrm{x}^{(\mathrm{k})}\right) \neq 0, Q∇f(x(k))=0,则 d = − Q ∇ f ( x ( k ) ) \mathrm{d}=-Q \nabla \mathrm{f}\left(\mathrm{x}^{(\mathrm{k})}\right) d=−Q∇f(x(k)) 是下降容许方向。
【证明】
易验证
Q
T
Q
=
[
I
−
N
T
(
N
N
⊤
)
−
1
N
]
T
[
I
−
N
T
(
N
N
⊤
)
−
1
N
]
\mathrm{Q}^{\mathrm{T}} \mathrm{Q}=\left[\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}\right]^{\mathrm{T}}\left[\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}\right]
QTQ=[I−NT(NN⊤)−1 N]T[I−NT(NN⊤)−1 N]
=
I
−
2
N
T
(
N
N
⊤
)
−
1
N
+
N
T
(
N
N
⊤
)
−
1
N
⋅
N
T
(
N
N
⊤
)
−
1
N
=
I
−
2
N
T
(
N
N
⊤
)
−
1
N
+
N
T
(
N
N
⊤
)
−
1
N
=
I
−
N
T
(
N
N
⊤
)
−
1
N
=
Q
\begin{aligned} &=\mathrm{I}-2 \mathrm{~N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}+\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \cdot \mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \\ &=\mathrm{I}-2 \mathrm{~N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}+\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \\ &=\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}=\mathrm{Q} \end{aligned}
=I−2 NT(NN⊤)−1 N+NT(NN⊤)−1 N⋅NT(NN⊤)−1 N=I−2 NT(NN⊤)−1 N+NT(NN⊤)−1 N=I−NT(NN⊤)−1 N=Q
下降
:
∇
f
(
x
k
)
T
d
=
−
∇
f
(
x
k
)
T
Q
∇
f
(
x
k
)
=
−
∥
Q
∇
f
(
x
k
)
∥
2
2
<
0
: \quad \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)^{\mathrm{T}} \mathrm{d}=-\nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)^{\mathrm{T}} \mathrm{Q} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)=-{\| \mathrm{Q} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \mathrm{\|}_{2}}^{2}<0
:∇f(xk)Td=−∇f(xk)TQ∇f(xk)=−∥Q∇f(xk)∥22<0 。
可行: 只要验证
A
1
d
≥
0
,
E
d
=
0
,
A_1d\geq 0, \mathrm{Ed}=0,
A1d≥0,Ed=0,
N
d
=
−
N
Q
∇
f
(
x
k
)
=
−
N
[
I
−
N
T
(
N
N
⊤
)
−
1
N
]
∇
f
(
x
k
)
=
−
[
N
−
N
]
∇
f
(
x
k
)
=
0
\begin{aligned} \mathrm{Nd} &=-\mathrm{NQ} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \\ &=-\mathrm{N}\left[\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}\right] \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \\ &=-[\mathrm{N}-\mathrm{N}] \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)=0 \end{aligned}
Nd=−NQ∇f(xk)=−N[I−NT(NN⊤)−1 N]∇f(xk)=−[N−N]∇f(xk)=0
Q
∇
f
(
x
k
)
=
0
Q \nabla \mathbf{f}\left(\mathbf{x}^{\mathbf{k}}\right)=0
Q∇f(xk)=0 的情况
Q
∇
f
(
x
k
)
=
[
I
−
N
T
(
N
N
⊤
)
−
1
N
]
∇
f
(
x
k
)
=
∇
f
(
x
k
)
−
N
T
(
N
N
⊤
)
−
1
N
∇
f
(
x
k
)
=
∇
f
(
x
k
)
−
N
T
q
=
0
\begin{aligned} \mathrm{Q} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) &=\left[\mathrm{I}-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N}\right] \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \\ &=\nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)-\mathrm{N}^{\mathrm{T}}\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) \\ &=\nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)-\mathrm{N}^{\mathrm{T}} \mathrm{q}=0 \quad \end{aligned}
Q∇f(xk)=[I−NT(NN⊤)−1 N]∇f(xk)=∇f(xk)−NT(NN⊤)−1 N∇f(xk)=∇f(xk)−NTq=0
记
q
=
(
N
N
⊤
)
−
1
N
∇
f
(
x
k
)
\mathrm{q}=\left(\mathrm{NN}^{\top}\right)^{-1} \mathrm{~N} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right)
q=(NN⊤)−1 N∇f(xk) ,对应于
N
N
N可将
q
q
q也加以分解
q
=
(
y
z
)
(
y
对应于
A
1
,
z
对应于
E
)
\boldsymbol{q}=\left(\begin{array}{l} \boldsymbol{y} \\ z \end{array}\right)\left(\boldsymbol{y} \text { 对应于 } \boldsymbol{A}_1, z\text { 对应于 } \boldsymbol{E}\right)
q=(yz)(y 对应于 A1,z 对应于 E)
这样上式就成为
∇
f
(
x
(
k
)
)
−
(
A
1
E
)
T
(
y
z
)
=
0
\quad \nabla f\left(x^{(k)}\right)-\left(\begin{array}{c}A_1\\ E\end{array}\right)^{T}\left(\begin{array}{l}y \\ z\end{array}\right)=0
∇f(x(k))−(A1E)T(yz)=0
即
∇
f
(
x
(
k
)
)
−
A
1
T
y
−
E
T
z
=
0
\text { 即 } \nabla f\left(x^{(k)}\right)-A_1^{ T} y-E^{T} z=0
即 ∇f(x(k))−A1Ty−ETz=0
Kuhn-Tucker定理
混合约束问题
KaTeX parse error: No such environment: align at position 8: \begin{̲a̲l̲i̲g̲n̲}̲ P : minimize …
设
x
∗
x^*
x∗为混合约束问题的一个极小点,
I
=
{
i
∣
g
i
(
x
∗
)
=
0
,
i
=
1
:
m
}
\mathrm{I}=\left\{\mathrm{i|g}_{\mathrm{i}}\left(\mathrm{x}^{*}\right)=0, \mathrm{i}=1: \mathrm{m}\right\}
I={i∣gi(x∗)=0,i=1:m} .函数
f
(
x
)
,
g
1
(
x
)
…
,
g
m
(
x
)
,
h
1
(
x
)
,
…
,
h
k
(
x
)
\mathrm{f}(\mathrm{x}), \mathrm{g}_{1}(\mathrm{x}) \ldots, \mathrm{g}_{\mathrm{m}}(\mathrm{x}), \mathrm{h}_{1}(\mathrm{x}), \ldots, \mathrm{h}_{k}(\mathrm{x})
f(x),g1(x)…,gm(x),h1(x),…,hk(x) 在
x
∗
\mathrm{x} ^*
x∗ 处都有一阶偏导数,当
∇
h
1
(
x
∗
)
,
…
,
∇
h
k
(
x
∗
)
,
∇
g
i
(
x
∗
)
(
i
∈
I
)
\nabla \mathrm{h}_{1}\left(\mathrm{x}^{*}\right), \ldots, \nabla \mathrm{h}_{k}\left(\mathrm{x}^{*}\right), \nabla \mathrm{g}_{\mathrm{i}}\left(\mathrm{x}^{*}\right)(\mathrm{i} \in \mathrm{I})
∇h1(x∗),…,∇hk(x∗),∇gi(x∗)(i∈I) 线性无关,
,
,
, 则存在实数
μ
1
,
…
,
μ
m
,
λ
1
,
…
,
λ
l
\mu_{1}, \ldots, \mu_{\mathrm{m}}, \lambda_{1}, \ldots, \lambda_{l}
μ1,…,μm,λ1,…,λl 使得
∇
f
(
x
∗
)
+
[
μ
1
∇
g
1
(
x
∗
)
+
μ
2
∇
g
2
(
x
∗
)
+
…
+
μ
m
∇
g
m
(
x
∗
)
]
+
[
λ
1
∇
h
1
(
x
∗
)
+
λ
2
∇
h
2
(
x
∗
)
+
…
+
λ
k
∇
h
l
(
x
∗
)
]
=
0
\begin{array}{l} \nabla \mathrm{f}\left(\mathrm{x}^{*}\right)+\left[\mu_{1} \nabla \mathrm{g}_{1}\left(\mathrm{x}^{*}\right)+\mu_{2} \nabla \mathrm{g}_{2}\left(\mathrm{x}^{*}\right)+\ldots+\mu_{\mathrm{m}} \nabla \mathrm{g}_{\mathrm{m}}\left(\mathrm{x}^{*}\right)\right] \\ +\left[\lambda_{1} \nabla \mathrm{h}_{1}\left(\mathrm{x}^{*}\right)+\lambda_{2} \nabla \mathrm{h}_{2}\left(\mathrm{x}^{*}\right)+\ldots+\lambda_{k} \nabla \mathrm{h}_{l}\left(\mathrm{x}^{*}\right)\right]=0 \\ \end{array}
∇f(x∗)+[μ1∇g1(x∗)+μ2∇g2(x∗)+…+μm∇gm(x∗)]+[λ1∇h1(x∗)+λ2∇h2(x∗)+…+λk∇hl(x∗)]=0
μ
i
g
i
(
x
∗
)
=
0
,
i
=
1
:
m
,
μ
i
≥
0
,
i
=
1
:
m
\mu_{\mathrm{i}} \mathrm{g}_{\mathrm{i}}\left(\mathrm{x}^{*}\right)=0, \mathrm{i}=1: \mathrm{m},\mu_{i} \geq 0, i=1: m
μigi(x∗)=0,i=1:m,μi≥0,i=1:m
.
(
.\left(\right.
.( 即
λ
j
\lambda_{\mathrm{j}}
λj 可正可负,
\quad
但
μ
i
\mu_{\mathrm{i}}
μi 必非负
)
)
)
定 理 \large\color{magenta}{\boxed{\color{brown}{定理} }} 定理 从而, 若y ⩾ 0 \geqslant 0 ⩾0 的话, 则上式就是K-T条件, 从而知 X ( k ) \mathrm{X}^{(\mathrm{k})} X(k) 为KKT,点。
但若 y ⩽ 0 \mathrm{y} \leqslant 0 y⩽0, 即y中有负分量比如 y j 0 y _{\mathrm{j0}} yj0 (可能会有多个, 任取一个) , y j 0 = m i n { y j } < 0 y _{\mathrm{j0}}=min\{y_j\}<0 yj0=min{yj}<0,这时将 A 1 A_1 A1 中与 y j 0 y _{\mathrm{j0}} yj0 相对应的那个行整个删去,仍记删去该行后的矩阵为N,用这个新的矩阵N构造Q,从而构造d,这时这个必为下降容许方向 。
直线搜索确定步长 α m a x \alpha_{max} αmax
$$
\begin{aligned} (P\alpha_{max}) ~ ~ ~ \min &~ ~ ~ f(x^{k}+\alpha d^k)\\text{s.t.} &~ ~ ~
0\leq \alpha \leq \alpha_{max} \end{aligned}
其
中
其中
其中
\alpha_{max} = \left{\begin{matrix}
min\left{ \frac{\hat{b_i}}{\hat{d_i}}, \hat{d_i}<0\right} ,&当 \hat{d}\ngeqslant0\ +\infty ,&当\hat{d}≥0
\end{matrix}\right.
$$
设得
α
ˉ
\bar\alpha
αˉ, 则得新的迭代点
x
(
k
+
1
)
=
x
(
k
)
+
α
ˉ
d
\mathrm{x}^{(\mathrm{k}+1)}=\mathrm{x}^{(\mathrm{k})}+\mathrm{\bar\alpha} \mathrm{~d}
x(k+1)=x(k)+αˉ d
投影梯度算法步骤:
-
取初始可行,点 x 0 , \mathrm{x}^{0}, \quad x0, 精度 ϵ > 0 \epsilon>0 ϵ>0, 置 k = 0 \mathrm{k}=0 k=0
-
在x k ^{\mathrm{k}} k 处将A, b分解
-
构造 N N N,从而构造 Q Q Q(若 N N N空的话,就取 Q = I Q=I Q=I)
-
计算 d k = − Q ∇ f ( x k ) d^{k}=-Q \nabla f\left(x^{k}\right) dk=−Q∇f(xk)
-
若 d k = 0 , \mathrm{d}^{\mathrm{k}}=0, dk=0, 或者 d k ≤ ϵ \mathrm{d}^{\mathrm{k}}\leq \epsilon dk≤ϵ , 则 当 N N N空时停止, 当 N N N非空时计算 q = ( N N T ) − 1 N ∇ f ( x k ) \mathrm{q}=\left(\mathrm{NN}^{\mathrm{T}}\right)^{-1} \mathrm{~N} \nabla \mathrm{f}\left(\mathrm{x}^{\mathrm{k}}\right) q=(NNT)−1 N∇f(xk) 并相应分解为两块y , z , \mathrm{z} ,z 。若 y ⩾ 0 \mathrm{y} \geqslant 0 y⩾0,停止迭代。否则, 修正 A 1 A_1 A1 , 将 A 1 A_1 A1 中与 y j 0 y _{\mathrm{j0}} yj0 相对应的那个行整个删去,转 3.
-
若 d ( k ) ≠ 0 , \mathrm{d}^{(\mathrm{k})} \neq 0, d(k)=0, 则确定步长,求解
$$
\begin{aligned} (P\alpha_{max}) ~ ~ ~ \min &~ ~ ~ f(x^{k}+\alpha d^k)\\text{s.t.} &~ ~ ~0\leq \alpha \leq \alpha_{max} \end{aligned}
$$
设得 α k \alpha _k αk, 则得新的迭代点 x ( k + 1 ) = x ( k ) + α k d \mathrm{x}^{(\mathrm{k}+1)}=\mathrm{x}^{(\mathrm{k})}+\mathrm{\alpha}_k \mathrm{~d} x(k+1)=x(k)+αk d,置 k = k + 1 , \mathrm{k}=\mathrm{k}+1, k=k+1, 转2
例子
min f ( x ) = x 1 2 + 4 x 2 2 s. t . x 1 + x 2 ≥ 1 15 x 1 + 10 x 2 ≥ 12 x 1 ⩾ 0 x 2 ≥ 0 \begin{array}{l} \min \quad &f(x)=x_{1}^{2}+4 x_{2}^{2} \\ \text { s. } t . \quad &x_{1}+x_{2} \geq 1 \\ &15 x_{1}+10 x_{2} \geq 12 \\ &x_{1} \quad \geqslant 0 \\ &x_{2} \geq 0 \end{array} min s. t.f(x)=x12+4x22x1+x2≥115x1+10x2≥12x1⩾0x2≥0
取初始容许点 x 0 = ( 0 , 2 ) T \mathrm{x}^{0}=(0,2)^{\mathrm{T}} x0=(0,2)T
【解】
∇
f
(
x
)
=
(
2
x
1
8
x
2
)
,
A
=
(
1
1
15
10
1
0
0
1
)
,
b
=
(
1
12
0
0
)
\nabla f(x)=\left(\begin{array}{l} 2 x_{1} \\ 8 x_{2} \end{array}\right), A=\left(\begin{array}{cc} 1 & 1 \\ 15 & 10 \\ 1 & 0 \\ 0 & 1 \end{array}\right), b=\left(\begin{array}{l} 1 \\ 12 \\ 0 \\ 0 \end{array}\right)
∇f(x)=(2x18x2),A=⎝⎜⎜⎛1151011001⎠⎟⎟⎞,b=⎝⎜⎜⎛11200⎠⎟⎟⎞
第一次迭代:
x
0
→
x
1
x^{0} \rightarrow x^{1}
x0→x1
(1) 下降容许方向的确定
要先生成寻优方向
d
0
,
d ^{0},
d0, 先求解投影矩阵
Q
Q
Q ,在
x
0
\mathrm{x}^{0}
x0 处 :
A
1
=
(
1
0
)
,
b
1
=
(
0
)
A
2
=
(
1
1
15
10
0
1
)
,
b
2
=
(
1
12
0
)
\mathrm{A}_1=(1 \quad 0), \mathrm{b}_1=(0) \quad \boldsymbol{A}_2=\left(\begin{array}{ll} 1 & 1 \\ 15 & 10 \\ 0 & 1 \end{array}\right), \boldsymbol{b}_2=\left(\begin{array}{l} 1 \\ 12 \\ 0 \end{array}\right)
A1=(10),b1=(0)A2=⎝⎛11501101⎠⎞,b2=⎝⎛1120⎠⎞
从而
N
=
(
1
,
0
)
N=(1,0)
N=(1,0)
Q
=
I
−
N
T
(
N
N
T
)
−
1
N
=
(
0
0
0
1
)
d
0
=
−
Q
∇
f
(
x
0
)
=
(
0
−
16
)
≠
0
\begin{array}{l} Q=I-N^{T}\left(N N^{T}\right)^{-1} N=\left(\begin{array}{ll} 0 & 0 \\ 0 & 1 \end{array}\right) \\ d^{0}=-Q \nabla f\left(x^{0}\right)=\left(\begin{array}{l} 0 \\ -16 \end{array}\right) \neq 0 \end{array}
Q=I−NT(NNT)−1N=(0001)d0=−Q∇f(x0)=(0−16)=0
(2)精确线搜索:
计算 b ^ = b 2 − A 2 x 0 = [ 1 12 0 ] − [ 1 1 15 10 0 1 ] [ 0 2 ] = [ − 1 − 8 − 2 ] , d ^ = A 2 d 0 = [ 1 1 15 10 0 1 ] [ 0 − 16 ] = [ − 16 − 160 − 16 ] \hat{b} =b_2-A_2x^{0} = \begin{bmatrix} 1 \\ 12 \\ 0 \end{bmatrix}- \begin{bmatrix} 1 & 1 \\ 15 & 10 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} 0\\2 \end{bmatrix} =\begin{bmatrix} -1\\-8\\-2 \end{bmatrix} ,~~ \hat{d}= A_2 d^0=\begin{bmatrix} 1 & 1 \\ 15 & 10 \\ 0 & 1 \end{bmatrix}\begin{bmatrix} 0\\-16 \end{bmatrix} =\begin{bmatrix} -16\\-160\\-16 \end{bmatrix} b^=b2−A2x0=⎣⎡1120⎦⎤−⎣⎡11501101⎦⎤[02]=⎣⎡−1−8−2⎦⎤, d^=A2d0=⎣⎡11501101⎦⎤[0−16]=⎣⎡−16−160−16⎦⎤
由于
d
^
≱
0
\hat{d}\ngeqslant0
d^0,计算
α
m
a
x
=
m
i
n
{
−
1
−
16
,
−
8
−
160
,
−
2
−
16
}
=
1
20
\alpha_{max} = min\left\{ \frac{{-1}}{{-16}},\frac{{-8}}{{-160}},\frac{{-2}}{{-16}}\right\} =\frac{{1}}{{20}}
αmax=min{−16−1,−160−8,−16−2}=201
而
x
0
+
α
d
0
=
(
0
2
)
+
α
(
0
−
16
)
=
(
0
2
−
16
α
)
x^{0}+\alpha d^0=\begin{pmatrix}0 \\ 2 \end{pmatrix} +\alpha \begin{pmatrix}0 \\ -16 \end{pmatrix} =\begin{pmatrix}0 \\2-16\alpha \end{pmatrix}
x0+αd0=(02)+α(0−16)=(02−16α)
$$
\begin{aligned} (P\alpha_{max}) ~ ~ ~ \min &~ ~ ~ f(x^{0}+\alpha d^0)=4(2-16\alpha )^2\\text{s.t.} &~ ~ ~
0\leq \alpha \leq \frac{{1}}{{20}} \end{aligned}
$$
令
f
(
x
0
+
α
d
0
)
′
=
0
,
0
≤
α
≤
1
20
f(x^{0}+\alpha d^0)'=0, 0\leq \alpha \leq \frac{{1}}{{20}}
f(x0+αd0)′=0,0≤α≤201解得
α
0
=
1
20
\alpha_0= \frac{{1}}{{20}}
α0=201.
则 $x{1}=x{0}+\alpha_1d^0=\begin{pmatrix}0
\ 1.2 \end{pmatrix} $
第二次迭代: x 1 → x 2 x^{1} \rightarrow x^{2} x1→x2
(1) 下降容许方向的确定
在
x
1
\mathrm{x}^{1}
x1 处2, 3 约束有效 ,则
A
1
=
(
15
10
1
0
)
,
b
1
=
(
12
0
)
A
2
=
(
1
1
0
1
)
,
b
2
=
(
1
0
)
\quad A_1=\left(\begin{array}{cc}15 & 10 \\ 1 & 0\end{array}\right), \boldsymbol{b}_1=\left(\begin{array}{l}12 \\ 0\end{array}\right) {A}_2=\left(\begin{array}{ll} 1 & 1 \\ 0 & 1 \end{array}\right), \boldsymbol{b}_2=\left(\begin{array}{l} 1 \\ 0 \end{array}\right)
A1=(151100),b1=(120)A2=(1011),b2=(10)
从而
$$
\begin{array}{}
N=\left(\begin{array}{ll}15 & 10 \ 1 & 0\end{array}\right) \
Q=I-N^{T}\left(N N{T}\right){-1} N \ =
\left(\begin{array}{ll}1 & 0 \
0 & 1\end{array}\right)
-\left(\begin{array}{ll}15 & 1 \ 10 & 0\end{array}\right) \left [ \left(\begin{array}{ll}15 & 10 \ 1 & 0\end{array}\right) \left(\begin{array}{ll}15 & 1 \ 10 & 0\end{array}\right) \right ]^{-1} \left(\begin{array}{ll}15 & 10 \ 1 & 0\end{array}\right)=0
\
\text { 从而 } d=-Q \nabla f\left(x^{(1)}\right)=0
\end{array}
计
算
计算
计算
\boldsymbol{q}=\left(\boldsymbol{N} \boldsymbol{N}{\boldsymbol{T}}\right){-1} \boldsymbol{N} \nabla \boldsymbol{f}\left(\boldsymbol{x}^{(1)}\right)=\left(\begin{array}{l}
0.96 \
-14.4
\end{array}\right)
$$
其中第二个分量小于0,故删去
A
1
A_1
A1中对应的行,从而有新的
A
1
=
(
15
10
)
,
\mathrm{A}_1=(15 \quad 10),
A1=(1510), 故有新的
N
=
(
15
10
)
\mathrm{N}=(15 \quad 10)
N=(1510)
故有新的
Q
=
I
−
N
T
(
N
N
T
)
−
1
N
Q=\boldsymbol{I}-\boldsymbol{N}^{T}\left(\boldsymbol{N N}^{T}\right)^{-1} \boldsymbol{N}
Q=I−NT(NNT)−1N
=
(
1
0
0
1
)
−
(
15
10
)
[
(
15
10
)
(
15
10
)
]
−
1
(
15
10
)
=\left(\begin{array}{cc}1 & 0 \\ 0 & 1\end{array}\right)-\left(\begin{array}{c}15 \\ 10\end{array}\right)\left[(15 \quad 10)\left(\begin{array}{c}15 \\ 10\end{array}\right)\right]^{-1}(15 \quad 10)
=(1001)−(1510)[(1510)(1510)]−1(1510)
=
(
0.3046
−
0.4635
−
0.4635
0.6923
)
=\left(\begin{array}{cc}0.3046 & -0.4635 \\ -0.4635 & 0.6923\end{array}\right)
=(0.3046−0.4635−0.46350.6923)
从而有寻优方向 d 1 = − Q ∇ f ( x 1 ) = ( 4.4307 − 6.6462 ) d^{1}=-Q \nabla f\left(x^{1}\right)=\left(\begin{array}{l}4.4307 \\ -6.6462\end{array}\right) d1=−Q∇f(x1)=(4.4307−6.6462)
x 1 \mathrm{x}^{1} x1 沿着该方向作线性搜索可得新的点 x 2 = ( 0.4 , 0.6 ) T \mathrm{x}^{2}=(0.4,0.6)^{\mathrm{T}} x2=(0.4,0.6)T
既约梯度法
既约梯度法首先由Wolfe提出,解决具有线性约束的非线性规划问题。1969年由Abadie与Carpentier推广至非线性约束;
基本思想:借鉴求解线性规划的单纯形算法,选择某些变量为基变量,其它的作为非基变量,将基变量用非基变量表示,并从目标函数中消去基变量,得到以非基变量为自变量的简化的目标函数,进而利用此函数的负梯度构造下降可行方向。
既约梯度:简化目标函数关于非基变量的梯度。
考虑以下问题
$$
\begin{aligned}
&\min f(x)\
&\begin{array}{l}
\text { s.t. } \quad A x=b, \quad x \geq 0 \
\end{array}
\end{aligned}
$$
其中
A
m
×
n
,
r
(
A
)
=
m
,
b
m
×
1
,
m
≤
n
A_{m \times n}, r(A)=m, b_{m \times 1} , m \leq n
Am×n,r(A)=m,bm×1,m≤n
思想:降维
$$
\begin{array}{ll}
\min &f\left(x_{B}, x_{N}\right) \
\text { s.t. } &B x_{B}+N x_{N}=b\
\quad& x_{B}, x_{N} \geq 0
\end{array} ~~~~~~~~~~
\Leftrightarrow ~~~~~~~~~~
\begin{array}{}
\min &F\left(x_{N}\right)\
\text { s.t. }& x_{B}, x_{N} \geq 0
\end{array}
B
可
逆
方
阵
B可逆方阵
B可逆方阵
A=(B ~~~N), \quad x=\left(\begin{array}{l}
x_{B} \
x_{N}
\end{array}\right) \quad \begin{array}{l}
x_{B}=B^{-1} b-B^{-1} N x_{N} \
\Rightarrow F\left(x_{N}\right)=f\left(x_{B}\left(x_{N}\right), x_{N}\right)
\end{array}
KaTeX parse error: Can't use function '$' in math mode at position 2: $̲f(x)$ 的既约梯度为
\begin{array}{l}
r\left(x_{N}\right)=\nabla F\left(x_{N}\right) \
=\nabla_{x_{N}} f(x)-\left(B^{-1} N\right)^{T} \nabla_{x_{B}} f(x)
\end{array}
$$
确定下降可行方向 d ( k ) d^{(k)} d(k)
- 下降方向 d ( k ) d^{(k)} d(k) : ∇ f ( x ) T d ( k ) < 0 ∇f(x)^Td^{(k)}< 0 ∇f(x)Td(k)<0
- 可行方向 d ( k ) d^{(k)} d(k): , A d = 0 , d j ≤ 0 ,Ad=0,d_j\leq 0 ,Ad=0,dj≤0 ,如果 x j = 0 x_j = 0 xj=0
d ( k ) = ( d B ( k ) d N ( k ) ) , d B ( k ) d^{(k)}=\left(\begin{array}{l}d_{B}^{(k)} \\ d_{N}^{(k)}\end{array}\right), \quad d_{B}^{(k)} d(k)=(dB(k)dN(k)),dB(k) 和 d N ( k ) d_{N}^{(k)} dN(k) 分别对应基变量和非基变量。
定义
d
N
(
k
)
,
d_{N}^{(k)},
dN(k), 使得
d
N
j
(
k
)
=
{
−
x
N
j
(
k
)
r
j
(
x
N
(
k
)
)
当
r
j
(
x
N
(
k
)
)
>
0
−
r
j
(
x
N
(
k
)
)
当
r
j
(
x
N
(
k
)
)
≤
0
\begin{aligned} d_{N_{j}}^{(k)} &=\left\{\begin{array}{ll} -x_{N_{j}}^{(k)} r_{j}\left(x_{N}^{(k)}\right) & \text { 当 } r_{j}\left(x_{N}^{(k)}\right)>0 \\ -r_{j}\left(x_{N}^{(k)}\right) & \text { 当 } r_{j}\left(x_{N}^{(k)}\right) \leq 0 \end{array}\right. \end{aligned}
dNj(k)=⎩⎨⎧−xNj(k)rj(xN(k))−rj(xN(k)) 当 rj(xN(k))>0 当 rj(xN(k))≤0
为得到可行方向,
应
有
A
d
(
k
)
=
0
,
应 有 A d^{(k)}=0,
应有Ad(k)=0, 即
B
d
B
(
k
)
+
N
d
N
(
k
)
=
0
\quad B d_{B}^{(k)}+N d_{N}^{(k)}=0
BdB(k)+NdN(k)=0
取
d
B
(
k
)
=
−
B
−
1
N
d
N
(
k
)
⇒
d
(
k
)
=
(
−
B
−
1
N
d
N
(
k
)
d
N
(
k
)
)
\text { 取 } \quad d_{B}^{(k)}=-B^{-1} N d_{N}^{(k)} \Rightarrow \quad d^{(k)}=\left(\begin{array}{c} -B^{-1} N d_{N}^{(k)} \\ d_{N}^{(k)} \end{array}\right)
取 dB(k)=−B−1NdN(k)⇒d(k)=(−B−1NdN(k)dN(k))
定
理
\large\color{magenta}{\boxed{\color{brown}{定理} }}
定理 设
x
x
x 是可行解,
A
=
(
B
,
N
)
A=(B, N)
A=(B,N) 是
m
×
n
m \times n
m×n 矩阵,
B
B
B 为
m
m
m 阶可逆矩阵,
x
=
(
x
B
T
,
x
N
T
)
T
,
x
B
>
0
,
x=\left(x_{B}^{T}, x_{N}^{T}\right)^{T}, x_{B}>0,
x=(xBT,xNT)T,xB>0, 函数
f
f
f
在点x处可微,又设
d
=
(
−
B
−
1
N
d
N
d
N
)
d=\left(\begin{array}{c} -B^{-1} N d_{N} \\ d_{N} \end{array}\right)
d=(−B−1NdNdN)
其中
d
N
j
=
{
−
x
N
j
r
j
(
x
N
)
r
j
(
x
N
)
>
0
−
r
j
(
x
N
)
r
j
(
x
N
)
≤
0
d_{N_{j}}=\left\{\begin{array}{ll} -x_{N_{j}} r_{j}\left(x_{N}\right) & r_{j}\left(x_{N}\right)>0 \\ -r_{j}\left(x_{N}\right) & r_{j}\left(x_{N}\right) \leq 0 \end{array}\right.
dNj={−xNjrj(xN)−rj(xN)rj(xN)>0rj(xN)≤0
如果
d
≠
0
,
d \neq 0,
d=0, 则
d
d
d 是下降可行方向, 而且
d
=
0
d=0
d=0 的充要条件是x为KKT点。
直线搜索确定步长 α m a x \alpha_{max} αmax
x k ∈ S , A x k = b , x B k > 0 , x N k ≥ 0 , A d k = 0 x k + α d k ∈ S x k + 1 ≥ 0 , x j k + 1 = x j k + α d j k ≥ 0 , j = 1 , ⋯ , n \begin{array}{l} x^{k} \in S, A x^{k}=b, x_{B}^{k}>0, x_{N}^{k} \geq 0, A d^{k}=0 \\ x^{k}+\alpha d^{k} \in S \\ x^{k+1} \geq 0, x_{j}^{k+1}=x_{j}^{k}+\alpha d_{j}^{k} \geq 0, j=1, \cdots, n \end{array} xk∈S,Axk=b,xBk>0,xNk≥0,Adk=0xk+αdk∈Sxk+1≥0,xjk+1=xjk+αdjk≥0,j=1,⋯,n
(1) 如果 d j k ≥ 0 , d_{j}^{k} \geq 0, djk≥0, 则 x j k + 1 ≥ 0 , ∀ α > 0 x_{j}^{k+1} \geq 0, \forall \alpha>0 xjk+1≥0,∀α>0.
(2) 如果 d j k < 0 , d_{j}^{k}<0, djk<0, 则 α ≤ x j k − d j k \alpha \leq \frac{x_{j}^{k}}{-d_{j}^{k}} α≤−djkxjk.
让 α max = { ∞ , d k ≥ 0 , min { x j k − d j k ∣ d j k < 0 } , else. \alpha_{\max }=\left\{\begin{array}{ll}\infty, & d^{k} \geq 0, \\ \min \left\{\frac{x_{j}^{k}}{-d_{j}^{k}} \mid d_{j}^{k}<0\right\}, & \text { else. }\end{array}\right. αmax=⎩⎨⎧∞,min{−djkxjk∣djk<0},dk≥0, else.
于是精确线搜索为:
$$
\begin{aligned} (P\alpha_{max}) ~ ~ ~ \min &~ ~ ~ f(x^{k}+\alpha d^k)\\text{s.t.} &~ ~ ~
0\leq \alpha \leq \alpha_{max} \end{aligned}\tag 7
$$
例子
min f ( x ) = 2 x 1 2 + x 2 2 s.t. x 1 − x 2 + x 3 = 2 − 2 x 1 + x 2 + x 4 = 1 x 1 , x 2 , x 3 , x 4 ≥ 0 \begin{array}{l} \min \quad f(x)=2 x_{1}^{2}+x_{2}^{2} \\ \text { s.t. } \quad x_{1}-x_{2}+x_{3} \quad=2 \\ -2 x_{1}+x_{2}+x_{4}=1 \\ x_{1}, x_{2}, x_{3}, x_{4} \geq 0 \end{array} minf(x)=2x12+x22 s.t. x1−x2+x3=2−2x1+x2+x4=1x1,x2,x3,x4≥0
解:
∇
f
(
x
)
=
(
4
x
1
2
x
2
0
0
)
,
A
=
(
1
−
1
1
0
−
2
1
0
1
)
,
b
=
(
2
1
)
\nabla f(x)=\left(\begin{array}{c} 4 x_{1} \\ 2 x_{2} \\ 0 \\ 0 \end{array}\right), A=\left(\begin{array}{rrrr} 1 & -1 & 1 & 0 \\ -2 & 1 & 0 & 1 \end{array}\right), b=\left(\begin{array}{l} 2 \\ 1 \end{array}\right)
∇f(x)=⎝⎜⎜⎛4x12x200⎠⎟⎟⎞,A=(1−2−111001),b=(21)
第一次迭代:
(1)
x
1
=
(
1
3
4
0
)
x^{1}=\left(\begin{array}{c}1 \\ 3 \\ 4 \\ 0\end{array}\right)
x1=⎝⎜⎜⎛1340⎠⎟⎟⎞
∈
\in
∈
S
S
S,
∇
f
(
x
1
)
=
(
4
6
0
0
)
\nabla f\left(x^{1}\right)=\left(\begin{array}{c}4 \\ 6 \\ 0 \\ 0\end{array}\right)
∇f(x1)=⎝⎜⎜⎛4600⎠⎟⎟⎞
x
B
=
(
x
2
x
3
)
=
(
3
4
)
,
x
N
=
(
x
1
x
4
)
=
(
1
0
)
x_{B}=\left(\begin{array}{c}x_{2} \\ x_{3}\end{array}\right)=\left(\begin{array}{l}3 \\ 4\end{array}\right), x_{N}=\left(\begin{array}{c}x_{1} \\ x_{4}\end{array}\right)=\left(\begin{array}{l}1 \\ 0\end{array}\right)
xB=(x2x3)=(34),xN=(x1x4)=(10)
B
=
(
−
1
1
1
0
)
,
B
−
1
=
(
0
1
1
1
)
,
N
=
(
1
0
−
2
1
)
B=\left(\begin{array}{rr}-1 & 1 \\ 1 & 0\end{array}\right), B^{-1}=\left(\begin{array}{ll}0 & 1 \\ 1 & 1\end{array}\right), N=\left(\begin{array}{rr}1 & 0 \\ -2 & 1\end{array}\right)
B=(−1110),B−1=(0111),N=(1−201)
(
r
N
1
)
T
=
∇
N
f
(
x
1
)
T
−
∇
B
f
(
x
1
)
T
B
−
1
N
=
(
4
0
)
T
−
(
6
0
)
T
(
0
1
1
1
)
(
1
0
−
2
1
)
=
(
16
−
6
)
T
\begin{aligned} \left(r_{N}^{1}\right)^{T} &=\nabla_{N} f\left(x^{1}\right)^{T}-\nabla_{B} f\left(x^{1}\right)^{T} B^{-1} N \\ &=\left(\begin{array}{c} 4 \\ 0 \end{array}\right)^{T}-\left(\begin{array}{c} 6 \\ 0 \end{array}\right)^{T}\left(\begin{array}{ll} 0 & 1 \\ 1 & 1 \end{array}\right)\left(\begin{array}{rr} 1 & 0 \\ -2 & 1 \end{array}\right) \\ &=\left(\begin{array}{c} 16 \\ -6 \end{array}\right)^{T} \end{aligned}
(rN1)T=∇Nf(x1)T−∇Bf(x1)TB−1N=(40)T−(60)T(0111)(1−201)=(16−6)T
d N 1 = ( d 1 1 d 4 1 ) = ( − 16 6 ) d B 1 = ( d 2 1 d 3 1 ) = − ( 0 1 1 1 ) ( 1 0 − 2 1 ) ( − 16 6 ) = ( − 38 − 22 ) d 1 = ( − 16 − 38 − 22 6 ) \begin{aligned} d_{N}^{1} &=\left(\begin{array}{c} d_{1}^{1} \\ d_{4}^{1} \end{array}\right)=\left(\begin{array}{r} -16 \\ 6 \end{array}\right) \\ d_{B}^{1} &=\left(\begin{array}{c} d_{2}^{1} \\ d_{3}^{1} \end{array}\right)=-\left(\begin{array}{cc} 0 & 1 \\ 1 & 1 \end{array}\right)\left(\begin{array}{rr} 1 & 0 \\ -2 & 1 \end{array}\right)\left(\begin{array}{r} -16 \\ 6 \end{array}\right) \\ &=\left(\begin{array}{r} -38 \\ -22 \end{array}\right) \\ d^{1} &=\left(\begin{array}{r} -16 \\ -38 \\ -22 \\ 6 \end{array}\right) \end{aligned} dN1dB1d1=(d11d41)=(−166)=(d21d31)=−(0111)(1−201)(−166)=(−38−22)=⎝⎜⎜⎛−16−38−226⎠⎟⎟⎞
α max = min { − 1 − 16 , − 3 − 38 , − 4 − 22 } = 1 16 x 1 + α d 1 = ( 1 3 4 0 ) + α ( − 16 − 38 − 22 6 ) = ( 1 − 16 α 3 − 38 α 4 − 22 α 6 α ) f ( x 1 + α d 1 ) = 2 ( 1 − 16 α ) 2 + ( 3 − 38 α ) 2 \begin{array}{l} \alpha_{\max }=\min \left\{\frac{-1}{-16}, \frac{-3}{-38}, \frac{-4}{-22}\right\}=\frac{1}{16} \\ x^{1}+\alpha d^{1}=\left(\begin{array}{c} 1 \\ 3 \\ 4 \\ 0 \end{array}\right)+\alpha\left(\begin{array}{r} -16 \\ -38 \\ -22 \\ 6 \end{array}\right)=\left(\begin{array}{c} 1-16 \alpha \\ 3-38 \alpha \\ 4-22 \alpha \\ 6 \alpha \end{array}\right) \\ f\left(x^{1}+\alpha d^{1}\right)=2(1-16 \alpha)^{2}+(3-38 \alpha)^{2} \end{array} αmax=min{−16−1,−38−3,−22−4}=161x1+αd1=⎝⎜⎜⎛1340⎠⎟⎟⎞+α⎝⎜⎜⎛−16−38−226⎠⎟⎟⎞=⎝⎜⎜⎛1−16α3−38α4−22α6α⎠⎟⎟⎞f(x1+αd1)=2(1−16α)2+(3−38α)2
直线搜索确定步长
α
m
a
x
\alpha_{max}
αmax
min
f
(
x
1
+
α
d
1
)
=
2
(
1
−
16
α
)
2
+
(
3
−
38
α
)
2
s.t.
0
≤
α
≤
1
16
\begin{array}{ll} \min & f\left(x^{1}+\alpha d^{1}\right)=2(1-16 \alpha)^{2}+(3-38 \alpha)^{2} \\ \text { s.t. } & 0 \leq \alpha \leq \frac{1}{16} \end{array}
min s.t. f(x1+αd1)=2(1−16α)2+(3−38α)20≤α≤161
得到
α
1
=
1
16
\alpha_{1}=\frac{1}{16}
α1=161
第二次迭代:
x
2
=
(
0
5
8
21
8
3
8
)
,
∇
f
(
x
2
)
=
(
0
5
4
0
0
)
x^{2}=\left(\begin{array}{c}0 \\ \frac{5}{8} \\ \frac{21}{8} \\ \frac{3}{8}\end{array}\right), \nabla f\left(x^{2}\right)=\left(\begin{array}{c}0 \\ \frac{5}{4} \\ 0 \\ 0\end{array}\right)
x2=⎝⎜⎜⎛08582183⎠⎟⎟⎞,∇f(x2)=⎝⎜⎜⎛04500⎠⎟⎟⎞
x
B
=
(
x
2
x
3
)
=
(
5
8
21
8
)
,
x
N
=
(
x
1
x
4
)
=
(
0
3
8
)
x_{B}=\left(\begin{array}{c}x_{2} \\ x_{3}\end{array}\right)=\left(\begin{array}{r}\frac{5}{8} \\ \frac{21}{8}\end{array}\right), x_{N}=\left(\begin{array}{c}x_{1} \\ x_{4}\end{array}\right)=\left(\begin{array}{c}0 \\ \frac{3}{8}\end{array}\right)
xB=(x2x3)=(85821),xN=(x1x4)=(083)
B
=
(
−
1
1
1
0
)
,
B
−
1
=
(
0
1
1
1
)
,
N
=
(
1
0
−
2
1
)
B=\left(\begin{array}{rr}-1 & 1 \\ 1 & 0\end{array}\right), B^{-1}=\left(\begin{array}{ll}0 & 1 \\ 1 & 1\end{array}\right), N=\left(\begin{array}{rr}1 & 0 \\ -2 & 1\end{array}\right)
B=(−1110),B−1=(0111),N=(1−201)
( r N 2 ) T = ∇ N f ( x 2 ) T − ∇ B f ( x 2 ) T B − 1 N = ( 0 0 ) T − ( 5 4 0 ) T ( 0 1 1 1 ) ( 1 0 − 2 1 ) = ( 5 2 − 5 4 ) T \begin{aligned}\left(r_{N}^{2}\right)^{T} &=\nabla_{N} f\left(x^{2}\right)^{T}-\nabla_{B} f\left(x^{2}\right)^{T} B^{-1} N \\ &=\left(\begin{array}{l}0 \\ 0\end{array}\right)^{T}-\left(\begin{array}{c}\frac{5}{4} \\ 0\end{array}\right)^{T}\left(\begin{array}{ll}0 & 1 \\ 1 & 1\end{array}\right)\left(\begin{array}{rr}1 & 0 \\ -2 & 1\end{array}\right) \\ &=\left(\begin{array}{r}\frac{5}{2} \\ -\frac{5}{4}\end{array}\right)^{T} \end{aligned} (rN2)T=∇Nf(x2)T−∇Bf(x2)TB−1N=(00)T−(450)T(0111)(1−201)=(25−45)T
d N 2 = ( d 1 2 d 4 2 ) = ( 0 5 4 ) d B 2 = ( d 2 2 d 3 2 ) = − ( 0 1 1 1 ) ( 1 0 − 2 1 ) ( 0 5 4 ) = ( − 5 4 5 4 ) d 2 = ( 0 − 5 4 − 5 4 − 5 4 ) \begin{aligned} d_{N}^{2} &=\left(\begin{array}{c}d_{1}^{2} \\ d_{4}^{2}\end{array}\right)=\left(\begin{array}{c}0 \\ \frac{5}{4}\end{array}\right) \\ d_{B}^{2} &=\left(\begin{array}{c}d_{2}^{2} \\ d_{3}^{2}\end{array}\right)=-\left(\begin{array}{cc}0 & 1 \\ 1 & 1\end{array}\right)\left(\begin{array}{rc}1 & 0 \\ -2 & 1\end{array}\right)\left(\begin{array}{c}0 \\ \frac{5}{4}\end{array}\right) \\ &=\left(\begin{array}{r}-\frac{5}{4} \\ \frac{5}{4}\end{array}\right) \\ d^{2} &=\left(\begin{array}{r}0 \\ -\frac{5}{4} \\ -\frac{5}{4} \\ -\frac{5}{4}\end{array}\right) \end{aligned} dN2dB2d2=(d12d42)=(045)=(d22d32)=−(0111)(1−201)(045)=(−4545)=⎝⎜⎜⎛0−45−45−45⎠⎟⎟⎞
α
max
=
min
{
5
/
8
5
/
4
,
21
/
8
5
/
4
}
=
1
2
\alpha_{\max }=\min \left\{\frac{5 / 8}{5 / 4}, \frac{21 / 8}{5 / 4}\right\}=\frac{1}{2}
αmax=min{5/45/8,5/421/8}=21
x
2
+
α
d
2
=
(
0
5
8
21
8
3
8
)
+
α
(
0
−
5
4
−
5
4
−
5
4
)
=
(
5
8
−
5
4
α
21
8
−
5
4
α
3
8
+
5
4
α
)
x^{2}+\alpha d^{2}=\left(\begin{array}{c}0 \\ \frac{5}{8} \\ \frac{21}{8} \\ \frac{3}{8}\end{array}\right)+\alpha\left(\begin{array}{r}0 \\ -\frac{5}{4} \\ -\frac{5}{4} \\ -\frac{5}{4}\end{array}\right)=\left(\begin{array}{c}\frac{5}{8}-\frac{5}{4} \alpha \\ \frac{21}{8}-\frac{5}{4} \alpha \\ \frac{3}{8}+\frac{5}{4} \alpha\end{array}\right)
x2+αd2=⎝⎜⎜⎛08582183⎠⎟⎟⎞+α⎝⎜⎜⎛0−45−45−45⎠⎟⎟⎞=⎝⎛85−45α821−45α83+45α⎠⎞
f
(
x
2
+
α
d
2
)
=
(
5
8
−
5
4
α
)
2
f\left(x^{2}+\alpha d^{2}\right)=\left(\frac{5}{8}-\frac{5}{4} \alpha\right)^{2}
f(x2+αd2)=(85−45α)2
直线搜索确定步长 α m a x \alpha_{max} αmax
min
f
(
x
2
+
α
d
2
)
=
(
5
8
−
5
4
α
)
2
s.t.
0
≤
α
≤
1
2
\begin{array}{ll} \min & f\left(x^{2}+\alpha d^{2}\right)=\left(\frac{5}{8}-\frac{5}{4} \alpha\right)^{2} \\ \text { s.t. } & 0 \leq \alpha \leq \frac{1}{2} \end{array}
min s.t. f(x2+αd2)=(85−45α)20≤α≤21
α
2
=
1
2
\alpha_{2}=\frac{1}{2}
α2=21.
第三次迭代:
x
3
=
(
0
0
2
1
)
,
∇
f
(
x
3
)
=
(
0
0
0
0
)
,
I
B
=
{
3
,
4
}
,
I
N
=
{
1
,
2
}
x^{3}=\left(\begin{array}{l}0 \\ 0 \\ 2 \\ 1\end{array}\right), \nabla f\left(x^{3}\right)=\left(\begin{array}{l}0 \\ 0 \\ 0 \\ 0\end{array}\right), I_{B}=\{3,4\}, I_{N}=\{1,2\}
x3=⎝⎜⎜⎛0021⎠⎟⎟⎞,∇f(x3)=⎝⎜⎜⎛0000⎠⎟⎟⎞,IB={3,4},IN={1,2}
x
B
=
(
x
3
x
4
)
=
(
2
1
)
,
x
N
=
(
x
1
x
2
)
=
(
0
0
)
x_{B}=\left(\begin{array}{c}x_{3} \\ x_{4}\end{array}\right)=\left(\begin{array}{c}2 \\ 1\end{array}\right), x_{N}=\left(\begin{array}{c}x_{1} \\ x_{2}\end{array}\right)=\left(\begin{array}{l}0 \\ 0\end{array}\right)
xB=(x3x4)=(21),xN=(x1x2)=(00)
B
=
(
1
0
0
1
)
,
B
−
1
=
(
1
0
0
1
)
,
N
=
(
1
−
1
−
2
1
)
B=\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right), B^{-1}=\left(\begin{array}{ll}1 & 0 \\ 0 & 1\end{array}\right), N=\left(\begin{array}{rr}1 & -1 \\ -2 & 1\end{array}\right)
B=(1001),B−1=(1001),N=(1−2−11)
(
r
N
3
)
T
=
∇
N
f
(
x
3
)
T
−
∇
B
f
(
x
3
)
T
B
−
1
N
=
(
0
0
)
T
−
(
0
0
)
T
(
1
0
0
1
)
(
1
−
1
−
2
1
)
=
(
0
0
)
T
\begin{aligned} \left(r_{N}^{3}\right)^{T} &=\nabla_{N} f\left(x^{3}\right)^{T}-\nabla_{B} f\left(x^{3}\right)^{T} B^{-1} N \\ &=\left(\begin{array}{l} 0 \\ 0 \end{array}\right)^{T}-\left(\begin{array}{l} 0 \\ 0 \end{array}\right)^{T}\left(\begin{array}{ll} 1 & 0 \\ 0 & 1 \end{array}\right)\left(\begin{array}{rr} 1 & -1 \\ -2 & 1 \end{array}\right) \\ &=\left(\begin{array}{l} 0 \\ 0 \end{array}\right)^{T} \end{aligned}
(rN3)T=∇Nf(x3)T−∇Bf(x3)TB−1N=(00)T−(00)T(1001)(1−2−11)=(00)T
停止迭代,
x
ˉ
=
x
3
=
(
0
0
2
1
)
\bar{x}=x^{3}=\left(\begin{array}{l} 0 \\ 0 \\ 2 \\ 1 \end{array}\right)
xˉ=x3=⎝⎜⎜⎛0021⎠⎟⎟⎞
参考资料
MIT凸优化课程
Stephen Boyd的Convex Optimization
《凸优化》,Stephen Boyd等著,王书宁等译
“最优化理论与算法”北京邮电大学数学系
ight)$
(
r
N
3
)
T
=
∇
N
f
(
x
3
)
T
−
∇
B
f
(
x
3
)
T
B
−
1
N
=
(
0
0
)
T
−
(
0
0
)
T
(
1
0
0
1
)
(
1
−
1
−
2
1
)
=
(
0
0
)
T
\begin{aligned} \left(r_{N}^{3}\right)^{T} &=\nabla_{N} f\left(x^{3}\right)^{T}-\nabla_{B} f\left(x^{3}\right)^{T} B^{-1} N \\ &=\left(\begin{array}{l} 0 \\ 0 \end{array}\right)^{T}-\left(\begin{array}{l} 0 \\ 0 \end{array}\right)^{T}\left(\begin{array}{ll} 1 & 0 \\ 0 & 1 \end{array}\right)\left(\begin{array}{rr} 1 & -1 \\ -2 & 1 \end{array}\right) \\ &=\left(\begin{array}{l} 0 \\ 0 \end{array}\right)^{T} \end{aligned}
(rN3)T=∇Nf(x3)T−∇Bf(x3)TB−1N=(00)T−(00)T(1001)(1−2−11)=(00)T
停止迭代,
x
ˉ
=
x
3
=
(
0
0
2
1
)
\bar{x}=x^{3}=\left(\begin{array}{l} 0 \\ 0 \\ 2 \\ 1 \end{array}\right)
xˉ=x3=⎝⎜⎜⎛0021⎠⎟⎟⎞
参考资料
MIT凸优化课程
Stephen Boyd的Convex Optimization
《凸优化》,Stephen Boyd等著,王书宁等译
“最优化理论与算法”北京邮电大学数学系