UA SIE545 优化理论基础2 凸函数 概念 理论 总结
凸函数的概念与简单性质
Convex function
f
:
S
→
R
f:S \to \mathbb{R}
f:S→R where
S
S
S is a nonempty convex set. Call
f
f
f a convex function on
S
S
S if
∀
x
1
,
x
2
∈
S
\forall x_1,x_2 \in S
∀x1,x2∈S,
λ
∈
(
0
,
1
)
\lambda \in (0,1)
λ∈(0,1)
f
(
λ
x
1
+
(
1
−
λ
)
x
2
)
≤
λ
f
(
x
1
)
+
(
1
−
λ
)
f
(
x
2
)
f(\lambda x_1+(1-\lambda)x_2) \le \lambda f(x_1)+(1-\lambda)f(x_2)
f(λx1+(1−λ)x2)≤λf(x1)+(1−λ)f(x2)
Call
f
f
f a strictly convex function on
S
S
S if
∀
x
1
,
x
2
∈
S
\forall x_1,x_2 \in S
∀x1,x2∈S,
λ
∈
(
0
,
1
)
\lambda \in (0,1)
λ∈(0,1)
f
(
λ
x
1
+
(
1
−
λ
)
x
2
)
<
λ
f
(
x
1
)
+
(
1
−
λ
)
f
(
x
2
)
f(\lambda x_1+(1-\lambda)x_2) < \lambda f(x_1)+(1-\lambda)f(x_2)
f(λx1+(1−λ)x2)<λf(x1)+(1−λ)f(x2)
Call f f f a (strictly) concave function on S S S if − f -f −f is (strictly) convex.
Level set
- Lower-level set: S α = { x ∈ S : f ( x ) ≤ α } S_{\alpha} = \{x \in S:f(x) \le \alpha\} Sα={x∈S:f(x)≤α}
- Upper-level set: { x ∈ S : f ( x ) ≥ α } \{x \in S:f(x) \ge \alpha\} {x∈S:f(x)≥α}
Properties
- If f f f is a convex function, S α S_{\alpha} Sα is a convex set.
- f ∈ C ( i n t S ) f \in C(int S) f∈C(intS) ( f f f is continuous on the interior of S S S)
- For R n \mathbb{R}^n Rn convex function f f f, any non-zero directional derivative exists. ∃ lim λ → 0 + f ( x ˉ + λ d ) − f ( x ˉ ) λ , x ˉ ∈ S , x ˉ + λ d ∈ S \exists \lim_{\lambda \to 0^+}\frac{f(\bar x+\lambda d)-f(\bar x)}{\lambda},\bar x \in S,\bar x + \lambda d \in S ∃λ→0+limλf(xˉ+λd)−f(xˉ),xˉ∈S,xˉ+λd∈S
次梯度(sub-gradient)
epigraph
f
:
S
→
R
f:S \to \mathbb{R}
f:S→R where
S
S
S is a nonempty convex set.
e
p
i
f
=
{
(
x
,
y
)
:
x
∈
S
,
y
∈
R
,
y
≥
f
(
x
)
}
epi \ f = \{(x,y):x \in S, y \in \mathbb{R}, y \ge f(x)\}
epi f={(x,y):x∈S,y∈R,y≥f(x)}
hypograph
f
:
S
→
R
f:S \to \mathbb{R}
f:S→R where
S
S
S is a nonempty convex set.
h
y
p
f
=
{
(
x
,
y
)
:
x
∈
S
,
y
∈
R
,
y
≤
f
(
x
)
}
hyp \ f = \{(x,y):x \in S, y \in \mathbb{R}, y \le f(x)\}
hyp f={(x,y):x∈S,y∈R,y≤f(x)}
Property f : S → R f:S \to \mathbb{R} f:S→R where S S S is a nonempty convex set. f f f is convex iff e p i f epi \ f epi f is convex.
subgradient
f
:
S
→
R
f:S \to \mathbb{R}
f:S→R is convex where
S
S
S is a nonempty convex set. Then
ξ
\xi
ξ is called subgradient of
f
f
f at
x
ˉ
\bar x
xˉ if
f
(
x
)
≥
f
(
x
ˉ
)
+
ξ
T
(
x
−
x
ˉ
)
,
∀
x
∈
S
f(x) \ge f(\bar x)+\xi^T(x-\bar x),\forall x \in S
f(x)≥f(xˉ)+ξT(x−xˉ),∀x∈S
f
:
S
→
R
f:S \to \mathbb{R}
f:S→R is concave where
S
S
S is a nonempty convex set. Then
ξ
\xi
ξ is called subgradient of
f
f
f at
x
ˉ
\bar x
xˉ if
f
(
x
)
≤
f
(
x
ˉ
)
+
ξ
T
(
x
−
x
ˉ
)
,
∀
x
∈
S
f(x) \le f(\bar x)+\xi^T(x-\bar x),\forall x \in S
f(x)≤f(xˉ)+ξT(x−xˉ),∀x∈S
Property
-
f
:
S
→
R
f:S \to \mathbb{R}
f:S→R is convex where
S
S
S is a nonempty convex set.
∀
x
ˉ
∈
i
n
t
S
\forall \bar x \in intS
∀xˉ∈intS,
∃
ξ
\exists \xi
∃ξ such that
f ( x ) ≥ f ( x ˉ ) + ξ T ( x − x ˉ ) , ∀ x ∈ S f(x) \ge f(\bar x)+\xi^T(x-\bar x),\forall x \in S f(x)≥f(xˉ)+ξT(x−xˉ),∀x∈Sand the hyperplane
H = { ( x , y ) : y = f ( x ˉ ) + ξ T ( x − x ˉ ) } H = \{(x,y):y = f(\bar x)+\xi^T(x-\bar x)\} H={(x,y):y=f(xˉ)+ξT(x−xˉ)}supports e p i f epi\ f epi f at ( x ˉ , f ( x ˉ ) ) (\bar x,f(\bar x)) (xˉ,f(xˉ)). -
f
:
S
→
R
f:S \to \mathbb{R}
f:S→R is strictly convex where
S
S
S is a nonempty convex set.
∀
x
ˉ
∈
i
n
t
S
\forall \bar x \in intS
∀xˉ∈intS,
∃
ξ
\exists \xi
∃ξ such that
f ( x ) > f ( x ˉ ) + ξ T ( x − x ˉ ) , ∀ x ∈ S f(x)> f(\bar x)+\xi^T(x-\bar x),\forall x \in S f(x)>f(xˉ)+ξT(x−xˉ),∀x∈S -
f
:
S
→
R
f:S \to \mathbb{R}
f:S→R where
S
S
S is a nonempty convex set. If
∀
x
ˉ
∈
i
n
t
S
\forall \bar x \in intS
∀xˉ∈intS,
∃
ξ
\exists \xi
∃ξ such that
f ( x ) ≥ f ( x ˉ ) + ξ T ( x − x ˉ ) , ∀ x ∈ S f(x) \ge f(\bar x)+\xi^T(x-\bar x),\forall x \in S f(x)≥f(xˉ)+ξT(x−xˉ),∀x∈Sthen f f f is convex.
可微的凸函数
differentiable
f
:
S
→
R
f:S \to \mathbb{R}
f:S→R where
S
S
S is a nonempty set. Say
f
f
f is differentiable at
x
ˉ
∈
i
n
t
S
\bar x \in int S
xˉ∈intS if
∃
β
\exists \beta
∃β such that
f
(
x
)
=
f
(
x
ˉ
)
+
β
T
(
x
−
x
ˉ
)
+
o
(
∥
x
−
x
ˉ
∥
)
β
=
∇
f
(
x
ˉ
)
f(x)=f(\bar x)+\beta^T(x-\bar x)+o(\left\| x-\bar x \right\|) \\ \beta = \nabla f(\bar x)
f(x)=f(xˉ)+βT(x−xˉ)+o(∥x−xˉ∥)β=∇f(xˉ)
Property
- f : S → R f:S \to \mathbb{R} f:S→R is convex where S S S is a nonempty convex set. If f f f is differentiable at x ˉ \bar x xˉ, then ∇ f ( x ˉ ) \nabla f(\bar x) ∇f(xˉ) is subgradient.
- f : S → R f:S \to \mathbb{R} f:S→R where S S S is a nonempty convex set. f f f is differentiable at x ˉ \bar x xˉ, then f f f is convex iff ∀ x ˉ ∈ S \forall \bar x \in S ∀xˉ∈S, f ( x ) ≥ f ( x ˉ ) + ∇ f ( x ˉ ) T ( x − x ˉ ) , ∀ x ∈ S f(x) \ge f(\bar x)+\nabla f(\bar x)^T(x-\bar x),\forall x \in S f(x)≥f(xˉ)+∇f(xˉ)T(x−xˉ),∀x∈S
- f : S → R f:S \to \mathbb{R} f:S→R where S S S is a nonempty convex set. f f f is differentiable on S S S, then f f f is convex iff ∀ x 1 , x 2 ∈ S \forall x_1, x_2 \in S ∀x1,x2∈S, [ ∇ f ( x 2 ) − ∇ f ( x 1 ) ] T ( x 2 − x 1 ) ≥ 0 [\nabla f(x_2)-\nabla f(x_1)]^T(x_2-x_1) \ge 0 [∇f(x2)−∇f(x1)]T(x2−x1)≥0
PSD positive semi-definite,
∀
x
∈
R
n
\forall x \in \mathbb{R}^n
∀x∈Rn,
x
T
H
x
≥
0
x^THx \ge 0
xTHx≥0
PD positive definite,
∀
x
∈
R
n
\forall x \in \mathbb{R}^n
∀x∈Rn,
x
T
H
x
>
0
x^THx > 0
xTHx>0
Tips for checking PSD/PD
- ∀ i , H i i < 0 \forall i,H_{ii}<0 ∀i,Hii<0, H H H is not PSD; ∀ i , H i i ≤ 0 \forall i,H_{ii} \le 0 ∀i,Hii≤0, H H H is not PD;
- main sub-matrix is PSD/PD then H H H is PSD/PD
- if H T = H H^T=H HT=H, PD = PSD+nonsingular
- if H H H is 2 × 2 2 \times 2 2×2, H 11 > 0 , H 22 > 0 , ∣ H ∣ > 0 H_{11}>0,H_{22}>0,|H|>0 H11>0,H22>0,∣H∣>0 means H H H is PD
Property
- f : S → R f:S \to \mathbb{R} f:S→R where S S S is a nonempty convex set. f f f is twice differentiable on S S S, then f f f is convex iff H f ( x ˉ ) Hf(\bar x) Hf(xˉ) is PSD.
- f : S → R f:S \to \mathbb{R} f:S→R where S S S is a nonempty convex set. f f f is twice differentiable on S S S, then if H f ( x ˉ ) Hf(\bar x) Hf(xˉ) is PD, f f f is strictly convex; if f f f is strictly convex, H f ( x ˉ ) Hf(\bar x) Hf(xˉ) is PSD; if f f f is strictly convex and quadratic, H f ( x ˉ ) Hf(\bar x) Hf(xˉ) is PD
- f : S → R f:S \to \mathbb{R} f:S→R where S S S is a nonempty convex set. f f f is infinitely differentiable on S S S, then f f f is strictly convex at x ˉ \bar x xˉ iff f ( n ) ( x ˉ ) > 0 f^{(n)}(\bar x)>0 f(n)(xˉ)>0 and f ( j ) ( x ) = 0 , ∀ 1 < j < n f^{(j)}(x) = 0,\forall 1< j<n f(j)(x)=0,∀1<j<n
凸函数的推广
quasiconvex f : S → R f:S \to \mathbb{R} f:S→R where S S S is a nonempty convex set. ∀ x 1 , x 2 ∈ S \forall x_1,x_2 \in S ∀x1,x2∈S, f ( λ x 1 + ( 1 − λ ) x 2 ) ≤ max ( f ( x 1 ) , f ( x 2 ) ) f(\lambda x_1+(1-\lambda)x_2) \le \max(f(x_1),f(x_2)) f(λx1+(1−λ)x2)≤max(f(x1),f(x2))
Property
- f f f is quasiconvex iff ∀ α \forall \alpha ∀α, S α S_{\alpha} Sα is convex
- S S S is nonempty compact polyhedral then ∃ x ˉ \exists \bar x ∃xˉ that is both an extreme point of S S S and also the optimal solution to min S f ( x ) \min_S f(x) minSf(x)
- S S S is open convex, f f f is differentiable. f f f is quasiconvex iff one of the following is correct: if x 1 , x 2 ∈ S x_1,x_2 \in S x1,x2∈S, f ( x 1 ) ≤ f ( x 2 ) f(x_1) \le f(x_2) f(x1)≤f(x2), then ∇ f ( x 2 ) T ( x 1 − x 2 ) ≤ 0 \nabla f(x_2)^T(x_1-x_2) \le 0 ∇f(x2)T(x1−x2)≤0 or if x 1 , x 2 ∈ S x_1,x_2 \in S x1,x2∈S, ∇ f ( x 2 ) T ( x 1 − x 2 ) ≤ 0 \nabla f(x_2)^T(x_1-x_2) \le 0 ∇f(x2)T(x1−x2)≤0, then f ( x 1 ) ≤ f ( x 2 ) f(x_1) \le f(x_2) f(x1)≤f(x2)
strict quasiconvex f : S → R f:S \to \mathbb{R} f:S→R where S S S is a nonempty convex set. ∀ x 1 , x 2 ∈ S \forall x_1,x_2 \in S ∀x1,x2∈S, f ( x 1 ) ≠ f ( x 2 ) f(x_1) \ne f(x_2) f(x1)=f(x2), f ( λ x 1 + ( 1 − λ ) x 2 ) < max ( f ( x 1 ) , f ( x 2 ) ) f(\lambda x_1+(1-\lambda)x_2) < \max(f(x_1),f(x_2)) f(λx1+(1−λ)x2)<max(f(x1),f(x2))
Property
- If x ˉ \bar x xˉ is local minimum to min S f ( x ) \min_S f(x) minSf(x) where S S S is convex, then x ˉ \bar x xˉ is global minimum.
strong quasiconvex
f
:
S
→
R
f:S \to \mathbb{R}
f:S→R where
S
S
S is a nonempty convex set.
∀
x
1
,
x
2
∈
S
\forall x_1,x_2 \in S
∀x1,x2∈S,
x
1
≠
x
2
x_1 \ne x_2
x1=x2,
f
(
λ
x
1
+
(
1
−
λ
)
x
2
)
<
max
(
f
(
x
1
)
,
f
(
x
2
)
)
f(\lambda x_1+(1-\lambda)x_2) < \max(f(x_1),f(x_2))
f(λx1+(1−λ)x2)<max(f(x1),f(x2))
Property
- strong quasiconvex leads to strict quasiconvex
- If x ˉ \bar x xˉ is local minimum to min S f ( x ) \min_S f(x) minSf(x) where S S S is convex, then x ˉ \bar x xˉ is unique global minimum.
Pseudoconvex ∀ x 1 , x 2 ∈ X \forall x_1,x_2 \in X ∀x1,x2∈X such that ∇ f ( x 1 ) T ( x 2 − x 1 ) ≥ 0 \nabla f(x_1)^T(x_2-x_1) \ge 0 ∇f(x1)T(x2−x1)≥0, we have f ( x 2 ) ≥ f ( x 1 ) f(x_2) \ge f(x_1) f(x2)≥f(x1).
Property
- Pseudoconvex + differentiable = strict quasiconvex
- Strict Pseudoconvex + differentiable = strong quasiconvex