UA SIE545 优化理论基础3 Fritz-John与Kuhn-Tucker理论总结 带等式约束与不等式约束的极值问题
对于函数 f : X → Y f:X \to Y f:X→Y,我们希望 X X X是一个凸的度量空间(或者拓扑空间), Y Y Y是一个定义了良序的内积空间。 N ϵ ( x ) N_{\epsilon}(x) Nϵ(x)表示中心为 x x x,半径为 ϵ \epsilon ϵ的领域,下面先列出几个会用到的概念。
Global minimum If x ˉ ∈ X \bar x \in X xˉ∈X, ∀ x ∈ X \forall x \in X ∀x∈X, f ( x ˉ ) ≤ f ( x ) f(\bar x) \le f(x) f(xˉ)≤f(x), then x ˉ \bar x xˉ is a global minimum. If equality in f ( x ˉ ) ≤ f ( x ) f(\bar x) \le f(x) f(xˉ)≤f(x) will never hold, then x ˉ \bar x xˉ is strict global minimum.
Local minimum If x ˉ ∈ X \bar x \in X xˉ∈X, ∃ ϵ > 0 \exists \epsilon>0 ∃ϵ>0, ∀ x ∈ N ϵ ( x ˉ ) \forall x \in N_{\epsilon}(\bar x) ∀x∈Nϵ(xˉ), f ( x ˉ ) ≤ f ( x ) f(\bar x) \le f(x) f(xˉ)≤f(x), then x ˉ \bar x xˉ is a local minimum. If equality in f ( x ˉ ) ≤ f ( x ) f(\bar x) \le f(x) f(xˉ)≤f(x) will never hold, then x ˉ \bar x xˉ is strict local minimum.
Pseudoconvex ∀ x 1 , x 2 ∈ X \forall x_1,x_2 \in X ∀x1,x2∈X such that ∇ f ( x 1 ) T ( x 2 − x 1 ) ≥ 0 \nabla f(x_1)^T(x_2-x_1) \ge 0 ∇f(x1)T(x2−x1)≥0, we have f ( x 2 ) ≥ f ( x 1 ) f(x_2) \ge f(x_1) f(x2)≥f(x1).
Quasiconvex ∀ x 1 , x 2 ∈ X , λ ∈ [ 0 , 1 ] \forall x_1,x_2 \in X,\lambda \in [0,1] ∀x1,x2∈X,λ∈[0,1] such that f ( λ x 1 + ( 1 − λ ) x 2 ) ≤ max ( f ( x ) , f ( y ) ) f(\lambda x_1+ (1-\lambda)x_2) \le \max(f(x),f(y)) f(λx1+(1−λ)x2)≤max(f(x),f(y))
min x ∈ S f ( x ) \min_{x \in S} f(x) minx∈Sf(x),其中 S S S是 X X X的子集。
Cone of feasible directions at
x
ˉ
∈
c
l
S
\bar x \in clS
xˉ∈clS,
D
=
{
d
≠
0
:
∃
δ
>
0
,
∀
λ
∈
(
0
,
δ
)
,
x
ˉ
+
λ
d
∈
S
}
D=\{d \ne 0:\exists \delta>0, \forall \lambda \in (0,\delta),\bar x+\lambda d \in S\}
D={d=0:∃δ>0,∀λ∈(0,δ),xˉ+λd∈S}
Each d ∈ D d \in D d∈D is called feasible direction.
Cone of improving direction at
x
ˉ
∈
c
l
S
\bar x \in clS
xˉ∈clS,
F
=
{
d
:
∃
δ
>
0
,
∀
λ
∈
(
0
,
δ
)
,
f
(
x
ˉ
+
λ
d
)
<
f
(
x
ˉ
)
}
F=\{d:\exists \delta>0,\forall \lambda \in (0,\delta),f(\bar x+\lambda d)<f(\bar x)\}
F={d:∃δ>0,∀λ∈(0,δ),f(xˉ+λd)<f(xˉ)}Each
d
∈
F
d \in F
d∈F is called improving direction.
无约束极值问题
min
x
∈
X
f
(
x
)
\min_{x \in X}f(x)
x∈Xminf(x)
Let x ˉ ∈ X \bar x \in X xˉ∈X be a candidate point. PSD = positive semi-definite, PD = positive definite.
First order necessary condition:
∇
f
(
x
ˉ
)
=
0
\nabla f(\bar x) = 0
∇f(xˉ)=0
Second order necessary condition:
H
(
x
ˉ
)
P
S
D
H(\bar x)\ PSD
H(xˉ) PSD
Second order sufficient condition:
H
(
x
ˉ
)
P
D
H(\bar x)\ PD
H(xˉ) PD
For convex optimization, sufficient and necessary condition:
∇
f
(
x
ˉ
)
=
0
\nabla f(\bar x) = 0
∇f(xˉ)=0
含等式约束与不等式约束的极值问题
min
x
∈
X
f
(
x
)
s
.
t
.
g
(
x
)
≤
0
h
(
x
)
=
0
\min_{x \in X} f(x) \\ s.t. g(x) \le 0 \\ h(x)=0
x∈Xminf(x)s.t.g(x)≤0h(x)=0
Let
L
L
L be Lagrange function:
L
(
x
)
=
f
(
x
)
+
u
T
g
(
x
)
+
v
T
h
(
x
)
L(x) = f(x)+u^Tg(x)+v^Th(x)
L(x)=f(x)+uTg(x)+vTh(x)
Fritz-John (FJ First order necessary condition):
u
0
T
∇
f
(
x
ˉ
)
+
u
T
∇
g
(
x
ˉ
)
+
v
T
∇
h
(
x
ˉ
)
=
0
u
T
g
(
x
ˉ
)
=
0
(
u
0
,
u
)
≥
0
,
(
u
0
,
u
,
v
)
≠
0
u_0^T\nabla f(\bar x)+u^T\nabla g(\bar x)+v^T\nabla h(\bar x)=0 \\ u^T g(\bar x) = 0 \\ (u_0,u) \ge 0, (u_0,u,v) \ne 0
u0T∇f(xˉ)+uT∇g(xˉ)+vT∇h(xˉ)=0uTg(xˉ)=0(u0,u)≥0,(u0,u,v)=0
Karush-Kuhn-Tucker (KKT first order necessary condition):
g
(
x
ˉ
)
≤
0
∇
f
(
x
ˉ
)
+
u
T
∇
g
(
x
ˉ
)
+
v
T
∇
h
(
x
ˉ
)
=
0
,
u
≥
0
u
T
g
(
x
ˉ
)
=
0
g(\bar x) \le 0\\ \nabla f(\bar x)+u^T\nabla g(\bar x)+v^T\nabla h(\bar x)=0,u \ge 0 \\ u^T g(\bar x) = 0
g(xˉ)≤0∇f(xˉ)+uT∇g(xˉ)+vT∇h(xˉ)=0,u≥0uTg(xˉ)=0
and ∇ g ( x ˉ ) \nabla g(\bar x) ∇g(xˉ) and ∇ h ( x ˉ ) \nabla h(\bar x) ∇h(xˉ) are linearly independent. 第一个条件叫primal feasibility(PF)、第二个条件dual feasibility(DF),第三个条件约束的是complementary slacks (CS),最后一个条件叫constraint qualification,保证KKT与FJ等价。
KKT second-order necessary condition:
H
L
(
x
ˉ
)
P
S
D
HL(\bar x)\ PSD
HL(xˉ) PSD
KKT second-order sufficient condition:
H
L
(
x
ˉ
)
P
D
HL(\bar x)\ PD
HL(xˉ) PD
例 考虑下面的优化问题
min
f
(
x
1
,
x
2
)
=
(
x
1
−
1
)
2
+
x
2
2
s
.
t
.
g
(
x
1
,
x
2
)
=
2
k
x
1
−
x
2
2
≤
0
,
k
>
0
\min f(x_1,x_2)=(x_1-1)^2+x_2^2 \\ s.t.g(x_1,x_2)=2kx_1-x_2^2 \le 0, k > 0
minf(x1,x2)=(x1−1)2+x22s.t.g(x1,x2)=2kx1−x22≤0,k>0
因为
H
f
(
x
1
,
x
2
)
=
[
2
0
0
2
]
,
H
g
(
x
1
,
x
2
)
=
[
0
0
0
−
2
]
Hf(x_1,x_2) = \left[ \begin{matrix} 2 & 0 \\ 0 & 2 \end{matrix}\right], \ Hg(x_1,x_2) = \left[ \begin{matrix} 0 & 0 \\ 0 & -2 \end{matrix}\right]
Hf(x1,x2)=[2002], Hg(x1,x2)=[000−2]
显然 f f f是凸函数但 g g g不是,因此这个优化不是凸优化,那么如何计算这个优化的最优解呢?
解
注意到目标函数是圆,约束的边界是抛物线,所以可以用作图法求解,非常简单读者可以自行尝试。我们来讨论一下代数解法,考虑KKT条件,但需要注意这个问题非凸,所以KKT条件并不是充分条件。因为
∇
g
(
x
1
,
x
2
)
=
2
[
k
−
x
2
]
,
k
>
0
\nabla g(x_1,x_2) = 2 \left[ \begin{matrix} k \\ -x_2 \end{matrix} \right],k>0
∇g(x1,x2)=2[k−x2],k>0
也就是说 ∇ g ( x 1 , x 2 ) \nabla g(x_1,x_2) ∇g(x1,x2)非零,所以 ∇ g ( x ˉ ) \nabla g(\bar x) ∇g(xˉ) and ∇ h ( x ˉ ) \nabla h(\bar x) ∇h(xˉ) are linearly independent这个条件成立,因此KKT是必要条件。
定义Lagrange函数:
L
(
x
1
,
x
2
,
u
)
=
f
(
x
1
,
x
2
)
+
u
g
(
x
1
,
x
2
)
=
(
x
1
−
1
)
2
+
x
2
2
+
u
(
2
k
x
1
−
x
2
2
)
L(x_1,x_2,u)=f(x_1,x_2)+ug(x_1,x_2) \\ =(x_1-1)^2+x_2^2 +u(2kx_1-x_2^2 )
L(x1,x2,u)=f(x1,x2)+ug(x1,x2)=(x1−1)2+x22+u(2kx1−x22)
写出KKT条件:
{
2
(
x
1
−
1
)
+
2
k
u
=
0
2
x
2
−
2
u
x
2
=
0
u
g
(
x
1
,
x
2
)
=
u
(
2
k
x
1
−
x
2
2
)
=
0
g
(
x
1
,
x
2
)
=
2
k
x
1
−
x
2
2
≤
0
u
≥
0
\begin{cases} 2(x_1-1)+2ku = 0 \\ 2x_2 -2ux_2 = 0 \\ug(x_1,x_2) = u(2kx_1-x_2^2) = 0 \\ g(x_1,x_2)=2kx_1-x_2^2 \le 0 \\ u \ge 0 \end{cases}
⎩⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎧2(x1−1)+2ku=02x2−2ux2=0ug(x1,x2)=u(2kx1−x22)=0g(x1,x2)=2kx1−x22≤0u≥0
接下来做分类讨论:
- u = 0 u=0 u=0, 则 x 2 = 0 x_2=0 x2=0, x 1 = 1 x_1=1 x1=1, not feasible ( x 1 = 1 , x 2 = 0 ⇒ g ( x 1 , x 2 ) > 0 x_1=1,x_2=0 \Rightarrow g(x_1,x_2) > 0 x1=1,x2=0⇒g(x1,x2)>0)
- u > 0 u>0 u>0, 第二个方程说明 2 x 2 ( u − 1 ) = 0 2x_2(u-1)=0 2x2(u−1)=0,如果 x 2 = 0 x_2 =0 x2=0,则 x 1 = 0 , u = 1 / k x_1=0,u=1/k x1=0,u=1/k;如果 x 2 ≠ 0 x_2 \ne 0 x2=0, 则 u = 1 u=1 u=1,那么 x 1 = 1 − k x_1=1-k x1=1−k, x 2 = ± 2 k ( 1 − k ) x_2=\pm \sqrt{2k(1-k)} x2=±2k(1−k)
现在我们有了三组可能的解,但还需要根据 k k k的值做分类讨论:
- k > 1 k>1 k>1,只有 ( 0 , 0 ) (0,0) (0,0)是KKT point,因此 ( 0 , 0 ) (0,0) (0,0)是最优解;
- k ≤ 1 k \le 1 k≤1, ( 0 , 0 ) , ( 1 − k , 2 k ( 1 − k ) ) , ( 1 − k , − 2 k ( 1 − k ) ) (0,0),(1-k,\sqrt{2k(1-k)}),(1-k,-\sqrt{2k(1-k)}) (0,0),(1−k,2k(1−k) ),(1−k,−2k(1−k) )都是KKT point,我们需要通过比较它们对应的目标函数的取值来确定谁是最优解
{ f ( 0 , 0 ) = 1 f ( 1 − k , ± 2 k ( 1 − k ) ) = 2 k − k 2 \begin{cases} f(0,0) = 1 \\ f(1-k,\pm \sqrt{2k(1-k)})=2k-k^2 \end{cases} {f(0,0)=1f(1−k,±2k(1−k) )=2k−k2
因为
2
k
−
k
2
−
1
=
−
(
k
−
1
)
2
≤
0
2k-k^2-1 = -(k-1)^2 \le 0
2k−k2−1=−(k−1)2≤0
所以 ( 1 − k , 2 k ( 1 − k ) ) , ( 1 − k , − 2 k ( 1 − k ) ) (1-k,\sqrt{2k(1-k)}),(1-k,-\sqrt{2k(1-k)}) (1−k,2k(1−k) ),(1−k,−2k(1−k) )是最优点。