凸集
如果过集合
C
C
C的任意两点的线段都在
C
C
C内,则称
C
C
C为凸集,即
x
1
,
x
2
∈
C
⇒
∀
θ
∈
[
0
,
1
]
,
θ
x
1
+
(
1
−
θ
)
x
2
∈
C
\boldsymbol{x}_1,\boldsymbol{x}_2\in C \Rightarrow \forall \theta\in \left[0,1\right],\theta \boldsymbol{x}_1+(1-\theta)\boldsymbol{x}_2\in C
x1,x2∈C⇒∀θ∈[0,1],θx1+(1−θ)x2∈C
凸函数
凸函数定义
如果
d
o
m
f
\bold{dom} f
domf是凸集,且
f
(
θ
x
+
(
1
−
θ
)
y
)
≤
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
f(\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y})\le \theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})
f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y)
对于所有
x
,
y
∈
d
o
m
f
,
0
≤
θ
≤
1
\boldsymbol{x},\boldsymbol{y}\in \bold{dom}f,0\le\theta \le 1
x,y∈domf,0≤θ≤1都成立,则称
f
f
f为凸函数
严格凸函数定义
如果
d
o
m
f
\bold{dom} f
domf是凸集,且
f
(
θ
x
+
(
1
−
θ
)
y
)
<
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
f(\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y})< \theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})
f(θx+(1−θ)y)<θf(x)+(1−θ)f(y)
对于所有
x
,
y
∈
d
o
m
f
,
x
≠
y
,
0
<
θ
<
1
\boldsymbol{x},\boldsymbol{y}\in \bold{dom}f,\boldsymbol{x}\neq \boldsymbol{y},0<\theta < 1
x,y∈domf,x=y,0<θ<1都成立,则称
f
f
f为严格凸函数
强凸函数定义
若
∃
m
>
0
\exists m>0
∃m>0,使得
g
(
x
)
=
f
(
x
)
−
m
2
∥
x
∥
2
g(\boldsymbol{x})=f(\boldsymbol{x})-\frac{m}{2}\Vert x\Vert^2
g(x)=f(x)−2m∥x∥2
为凸函数,则称
f
(
x
)
f(\boldsymbol{x})
f(x)为强凸函数,其中
m
m
m为强凸参数
f
(
θ
x
+
(
1
−
θ
)
y
)
=
g
(
θ
x
+
(
1
−
θ
)
y
)
+
m
2
∥
θ
x
+
(
1
−
θ
)
y
∥
2
≤
θ
g
(
x
)
+
(
1
−
θ
)
g
(
y
)
+
m
2
∥
θ
x
+
(
1
−
θ
)
y
∥
2
=
θ
f
(
x
)
−
m
2
θ
∥
x
∥
2
+
(
1
−
θ
)
f
(
y
)
−
m
2
(
1
−
θ
)
∥
y
∥
2
+
m
2
∥
θ
x
+
(
1
−
θ
)
y
∥
2
=
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
+
m
2
(
∥
θ
x
+
(
1
−
θ
)
y
∥
2
−
θ
∥
x
∥
2
−
(
1
−
θ
)
∥
y
∥
2
)
=
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
+
m
2
(
∑
(
θ
x
i
+
(
1
−
θ
)
y
i
)
2
−
θ
∑
x
i
2
−
(
1
−
θ
)
∑
y
i
2
)
=
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
+
m
2
(
θ
(
θ
−
1
)
∑
x
i
2
−
(
1
−
θ
)
θ
∑
y
i
2
+
2
θ
(
1
−
θ
)
∑
x
i
y
i
)
=
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
−
m
2
θ
(
1
−
θ
)
(
∑
x
i
2
+
∑
y
i
2
−
2
∑
x
i
y
i
)
=
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
−
m
2
θ
(
1
−
θ
)
∥
x
−
y
∥
2
\begin{aligned} &\quad f(\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y})\\ &=g(\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y})+\frac{m}{2}\Vert \theta \boldsymbol{x}+(1-\theta)\boldsymbol{y}\Vert^2\\ &\le \theta g(\boldsymbol{x})+(1-\theta)g(\boldsymbol{y})+\frac{m}{2}\Vert \theta \boldsymbol{x}+(1-\theta)\boldsymbol{y}\Vert^2\\ &=\theta f(\boldsymbol{x})-\frac{m}{2}\theta\Vert \boldsymbol{x}\Vert^2+(1-\theta)f(\boldsymbol{y})-\frac{m}{2}(1-\theta)\Vert \boldsymbol{y}\Vert^2+\frac{m}{2}\Vert \theta \boldsymbol{x}+(1-\theta)\boldsymbol{y}\Vert^2\\ &=\theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})+\frac{m}{2}(\Vert \theta \boldsymbol{x}+(1-\theta)\boldsymbol{y}\Vert^2-\theta\Vert \boldsymbol{x}\Vert^2-(1-\theta)\Vert \boldsymbol{y}\Vert^2)\\ &=\theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})+\frac{m}{2}(\sum (\theta x_i+(1-\theta)y_i)^2-\theta\sum x_i^2-(1-\theta)\sum y_i^2)\\ &=\theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})+\frac{m}{2}(\theta(\theta-1)\sum x_i^2-(1-\theta)\theta\sum y_i^2+2\theta(1-\theta)\sum x_iy_i)\\ &=\theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})-\frac{m}{2}\theta(1-\theta)(\sum x_i^2+\sum y_i^2-2\sum x_iy_i)\\ &=\theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})-\frac{m}{2}\theta(1-\theta)\Vert \boldsymbol{x}-\boldsymbol{y}\Vert^2 \end{aligned}
f(θx+(1−θ)y)=g(θx+(1−θ)y)+2m∥θx+(1−θ)y∥2≤θg(x)+(1−θ)g(y)+2m∥θx+(1−θ)y∥2=θf(x)−2mθ∥x∥2+(1−θ)f(y)−2m(1−θ)∥y∥2+2m∥θx+(1−θ)y∥2=θf(x)+(1−θ)f(y)+2m(∥θx+(1−θ)y∥2−θ∥x∥2−(1−θ)∥y∥2)=θf(x)+(1−θ)f(y)+2m(∑(θxi+(1−θ)yi)2−θ∑xi2−(1−θ)∑yi2)=θf(x)+(1−θ)f(y)+2m(θ(θ−1)∑xi2−(1−θ)θ∑yi2+2θ(1−θ)∑xiyi)=θf(x)+(1−θ)f(y)−2mθ(1−θ)(∑xi2+∑yi2−2∑xiyi)=θf(x)+(1−θ)f(y)−2mθ(1−θ)∥x−y∥2
所以等价定义
若
∃
m
>
0
\exists m>0
∃m>0,使得
∀
x
,
y
∈
d
o
m
f
,
θ
∈
(
0
,
1
)
\forall x,y\in \bold{dom}f,\theta\in(0,1)
∀x,y∈domf,θ∈(0,1)
有
f
(
θ
x
+
(
1
−
θ
)
y
)
≤
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
−
m
2
θ
(
1
−
θ
)
∥
x
−
y
∥
2
f(\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y})\le \theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})-\frac{m}{2}\theta(1-\theta)\Vert \boldsymbol{x}-\boldsymbol{y}\Vert^2
f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y)−2mθ(1−θ)∥x−y∥2
则称
f
(
x
)
f(\boldsymbol{x})
f(x)为强凸函数,其中
m
m
m为强凸参数
凸函数判定
定理1
f
(
x
)
f(\boldsymbol{x})
f(x)是凸函数当且仅当
∀
x
∈
d
o
m
f
,
v
∈
R
n
,
g
:
R
→
R
,
\forall \boldsymbol{x} \in \bold{dom}\ f,\boldsymbol{v}\in\mathbb{R}^n,g:\mathbb{R}\to \mathbb{R},
∀x∈dom f,v∈Rn,g:R→R,
g
(
t
)
=
f
(
x
+
t
v
)
,
d
o
m
g
=
{
t
∣
x
+
t
v
∈
d
o
m
f
}
g(t)=f(\boldsymbol{x}+t\boldsymbol{v}),\bold{dom}\ g=\{t\mid\boldsymbol{x}+t\boldsymbol{v}\in\bold{dom}\ f\}
g(t)=f(x+tv),dom g={t∣x+tv∈dom f}
是凸函数
证明:
必要性:设
f
(
x
)
f(\boldsymbol{x})
f(x)是凸函数
∀
t
1
,
t
2
∈
d
o
m
g
,
θ
∈
(
0
,
1
)
\forall t_1,t_2\in \bold{dom}\ g,\theta\in(0,1)
∀t1,t2∈dom g,θ∈(0,1)
x
+
t
1
v
∈
d
o
m
f
x
+
t
2
v
∈
d
o
m
f
\boldsymbol{x}+t_1\boldsymbol{v}\in\bold{dom}\ f\\ \boldsymbol{x}+t_2\boldsymbol{v}\in\bold{dom}\ f\\
x+t1v∈dom fx+t2v∈dom f
由
d
o
m
f
\bold{dom}\ f
dom f是凸集,立即推
x
+
(
θ
t
1
+
(
1
−
θ
)
t
2
)
v
∈
d
o
m
f
\boldsymbol{x}+(\theta t_1+(1-\theta)t_2)\boldsymbol{v}\in\bold{dom}\ f
x+(θt1+(1−θ)t2)v∈dom f
所以
θ
t
1
+
(
1
−
θ
)
t
2
∈
d
o
m
g
\theta t_1+(1-\theta)t_2\in \bold{dom}\ g
θt1+(1−θ)t2∈dom g,即
d
o
m
g
\bold{dom}\ g
dom g为凸集
g
(
θ
t
1
+
(
1
−
θ
)
t
2
)
=
f
(
x
+
(
θ
t
1
+
(
1
−
θ
)
t
2
)
v
)
=
f
(
θ
(
x
+
t
1
v
)
+
(
1
−
θ
)
(
x
+
t
2
v
)
)
⩽
θ
f
(
x
+
t
1
v
)
+
(
1
−
θ
)
f
(
x
+
t
2
v
)
=
θ
g
(
t
1
)
+
(
1
−
θ
)
g
(
t
2
)
.
\begin{aligned} g\left(\theta t_{1}+(1-\theta) t_{2}\right) &=f\left(\boldsymbol{x}+\left(\theta t_{1}+(1-\theta) t_{2}\right) \boldsymbol{v}\right) \\ &=f\left(\theta\left(\boldsymbol{x}+t_{1} \boldsymbol{v}\right)+(1-\theta)\left(\boldsymbol{x}+t_{2} \boldsymbol{v}\right)\right) \\ & \leqslant \theta f\left(\boldsymbol{x}+t_{1} \boldsymbol{v}\right)+(1-\theta) f\left(\boldsymbol{x}+t_{2} \boldsymbol{v}\right) \\ &=\theta g\left(t_{1}\right)+(1-\theta) g\left(t_{2}\right) . \end{aligned}
g(θt1+(1−θ)t2)=f(x+(θt1+(1−θ)t2)v)=f(θ(x+t1v)+(1−θ)(x+t2v))⩽θf(x+t1v)+(1−θ)f(x+t2v)=θg(t1)+(1−θ)g(t2).
所以
g
(
t
)
g(t)
g(t)是凸函数
充分性:
取
v
=
y
−
x
,
t
1
=
0
,
t
2
=
1
\boldsymbol{v}=\boldsymbol{y}-\boldsymbol{x},t_1=0,t_2=1
v=y−x,t1=0,t2=1
由
d
o
m
g
\bold{dom}\ g
dom g是凸集可知,
θ
⋅
0
+
(
1
−
θ
)
⋅
1
∈
d
o
m
g
\theta\cdot 0+(1-\theta)\cdot 1\in \bold{dom}\ g
θ⋅0+(1−θ)⋅1∈dom g
即
θ
x
+
(
1
−
θ
)
y
∈
d
o
m
f
\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y}\in\bold{dom}\ f
θx+(1−θ)y∈dom f是凸集
g
(
1
−
θ
)
=
g
(
θ
t
1
+
(
1
−
θ
)
t
2
)
⩽
θ
g
(
t
1
)
+
(
1
−
θ
)
g
(
t
2
)
=
θ
g
(
0
)
+
(
1
−
θ
)
g
(
1
)
=
θ
f
(
x
)
+
(
1
−
θ
)
f
(
y
)
\begin{aligned} g(1-\theta) &=g\left(\theta t_{1}+(1-\theta) t_{2}\right) \\ & \leqslant \theta g\left(t_{1}\right)+(1-\theta) g\left(t_{2}\right) \\ &=\theta g(0)+(1-\theta) g(1) \\ &=\theta f(x)+(1-\theta) f(y) \end{aligned}
g(1−θ)=g(θt1+(1−θ)t2)⩽θg(t1)+(1−θ)g(t2)=θg(0)+(1−θ)g(1)=θf(x)+(1−θ)f(y)
g
(
1
−
θ
)
=
f
(
x
+
(
1
−
θ
)
(
y
−
x
)
)
=
f
(
θ
x
+
(
1
−
θ
)
y
)
g(1-\theta)=f( \boldsymbol{x}+(1-\theta)(\boldsymbol{y}- \boldsymbol{x}))=f(\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y})
g(1−θ)=f(x+(1−θ)(y−x))=f(θx+(1−θ)y)
所以
f
(
x
)
f( \boldsymbol{x})
f(x)是凸函数
定理2
定义在凸集上的可微函数
f
f
f,
f
f
f是凸函数,当且仅当
f
(
y
)
⩾
f
(
x
)
+
∇
f
(
x
)
T
(
y
−
x
)
,
∀
x
,
y
∈
dom
f
f(\boldsymbol{y}) \geqslant f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x}), \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f
f(y)⩾f(x)+∇f(x)T(y−x),∀x,y∈domf
证明:
必要性:
设
f
f
f是凸函数,则
∀
x
,
y
∈
dom
f
\forall \boldsymbol{x},\boldsymbol{y}\in \operatorname{dom} f
∀x,y∈domf,以及
t
∈
(
0
,
1
)
t\in(0,1)
t∈(0,1),有
t
f
(
y
)
+
(
1
−
t
)
f
(
x
)
⩾
f
(
x
+
t
(
y
−
x
)
)
f
(
y
)
−
f
(
x
)
⩾
f
(
x
+
t
(
y
−
x
)
)
−
f
(
x
)
t
\begin{aligned} t f(\boldsymbol{y})+(1-t) f(\boldsymbol{x}) &\geqslant f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))\\ f(\boldsymbol{y})-f(\boldsymbol{x}) &\geqslant \frac{f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))-f(\boldsymbol{x})}{t} \end{aligned}
tf(y)+(1−t)f(x)f(y)−f(x)⩾f(x+t(y−x))⩾tf(x+t(y−x))−f(x)
令
t
→
0
t\to 0
t→0,利用保号性
f
(
y
)
−
f
(
x
)
⩾
lim
t
→
0
f
(
x
+
t
(
y
−
x
)
)
−
f
(
x
)
t
=
∇
f
(
x
)
T
(
y
−
x
)
f(\boldsymbol{y})-f(\boldsymbol{x}) \geqslant \lim _{t \rightarrow 0} \frac{f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))-f(\boldsymbol{x})}{t}=\nabla f(\boldsymbol{x})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x})
f(y)−f(x)⩾t→0limtf(x+t(y−x))−f(x)=∇f(x)T(y−x)
充分性:
设
f
f
f是凸函数,则
∀
x
,
y
∈
dom
f
\forall \boldsymbol{x},\boldsymbol{y}\in \operatorname{dom} f
∀x,y∈domf,以及
t
∈
(
0
,
1
)
t\in(0,1)
t∈(0,1)
设
z
=
t
x
+
(
1
−
t
)
y
\boldsymbol{z}=t\boldsymbol{x}+(1-t)\boldsymbol{y}
z=tx+(1−t)y
f
(
x
)
⩾
f
(
z
)
+
∇
f
(
z
)
T
(
x
−
z
)
f
(
y
)
⩾
f
(
z
)
+
∇
f
(
z
)
T
(
y
−
z
)
\begin{aligned} &f(\boldsymbol{x}) \geqslant f(\boldsymbol{z})+\nabla f(\boldsymbol{z})^{\mathrm{T}}(\boldsymbol{x}-\boldsymbol{z}) \\ &f(\boldsymbol{y}) \geqslant f(z)+\nabla f(\boldsymbol{z})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{z}) \end{aligned}
f(x)⩾f(z)+∇f(z)T(x−z)f(y)⩾f(z)+∇f(z)T(y−z)
于是
t
f
(
x
)
+
(
1
−
t
)
f
(
y
)
⩾
f
(
z
)
+
0
=
f
(
t
x
+
(
1
−
t
)
y
)
t f(\boldsymbol{x})+(1-t) f(\boldsymbol{y}) \geqslant f(\boldsymbol{z})+0=f(t\boldsymbol{x}+(1-t)\boldsymbol{y})
tf(x)+(1−t)f(y)⩾f(z)+0=f(tx+(1−t)y)
推论2.1
定义在凸集上的可微函数
f
f
f,
f
f
f是严格凸函数,当且仅当
f
(
y
)
>
f
(
x
)
+
∇
f
(
x
)
T
(
y
−
x
)
,
∀
x
,
y
∈
dom
f
f(\boldsymbol{y}) > f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x}), \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f
f(y)>f(x)+∇f(x)T(y−x),∀x,y∈domf
推论2.2
定义在凸集上的可微函数
f
f
f,
f
f
f是强凸函数,当且仅当
f
(
y
)
≥
f
(
x
)
+
∇
f
(
x
)
T
(
y
−
x
)
+
m
2
∥
y
−
x
∥
2
,
∀
x
,
y
∈
dom
f
f(\boldsymbol{y}) \ge f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x})+\frac{m}{2}\Vert \boldsymbol{y}-\boldsymbol{x}\Vert^2, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f
f(y)≥f(x)+∇f(x)T(y−x)+2m∥y−x∥2,∀x,y∈domf
定理3
设
f
f
f是可微函数,则
f
f
f是凸函数当且仅当
d
o
m
f
\bold{dom}\ f
dom f为凸集且
∇
f
\nabla f
∇f为单调映射,即
(
∇
f
(
x
)
−
∇
f
(
y
)
)
T
(
x
−
y
)
⩾
0
,
∀
x
,
y
∈
dom
f
(\nabla f(\boldsymbol{x})-\nabla f(\boldsymbol{y}))^{\mathrm{T}}(\boldsymbol{x}-\boldsymbol{y}) \geqslant 0, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f
(∇f(x)−∇f(y))T(x−y)⩾0,∀x,y∈domf
证明:
必要性:
f
(
y
)
⩾
f
(
x
)
+
∇
f
(
x
)
T
(
y
−
x
)
f
(
x
)
⩾
f
(
y
)
+
∇
f
(
y
)
T
(
x
−
y
)
\begin{aligned} &f(\boldsymbol{y}) \geqslant f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x}) \\ &f(\boldsymbol{x}) \geqslant f(\boldsymbol{y})+\nabla f(\boldsymbol{y})^{\mathrm{T}}(\boldsymbol{x}-\boldsymbol{y}) \end{aligned}
f(y)⩾f(x)+∇f(x)T(y−x)f(x)⩾f(y)+∇f(y)T(x−y)
相加得
(
∇
f
(
x
)
−
∇
f
(
y
)
)
T
(
x
−
y
)
⩾
0
,
∀
x
,
y
∈
dom
f
(\nabla f(\boldsymbol{x})-\nabla f(\boldsymbol{y}))^{\mathrm{T}}(\boldsymbol{x}-\boldsymbol{y}) \geqslant 0, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f
(∇f(x)−∇f(y))T(x−y)⩾0,∀x,y∈domf
充分性:
设
g
(
t
)
=
f
(
x
+
t
(
y
−
x
)
)
,
g
′
(
t
)
=
∇
f
(
x
+
t
(
y
−
x
)
)
T
(
y
−
x
)
g(t)=f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x})), \quad g^{\prime}(t)=\nabla f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x})
g(t)=f(x+t(y−x)),g′(t)=∇f(x+t(y−x))T(y−x)
由
∇
f
\nabla f
∇f单调映射,
∀
t
>
0
,
g
′
(
t
)
≥
g
′
(
0
)
\forall t>0,g'(t)\ge g'(0)
∀t>0,g′(t)≥g′(0)
f
(
y
)
=
g
(
1
)
=
g
(
0
)
+
∫
0
1
g
′
(
t
)
d
t
⩾
g
(
0
)
+
g
′
(
0
)
=
f
(
x
)
+
∇
f
(
x
)
T
(
y
−
x
)
\begin{aligned} f(y) &=g(1)=g(0)+\int_{0}^{1} g^{\prime}(t) \mathrm{d} t \\ & \geqslant g(0)+g^{\prime}(0)\\ &=f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x}) \end{aligned}
f(y)=g(1)=g(0)+∫01g′(t)dt⩾g(0)+g′(0)=f(x)+∇f(x)T(y−x)
推论3.1
f
f
f是严格凸函数当且仅当
(
∇
f
(
x
)
−
∇
f
(
y
)
)
T
(
x
−
y
)
>
0
,
∀
x
,
y
∈
dom
f
(\nabla f(\boldsymbol{x})-\nabla f(\boldsymbol{y}))^{\mathrm{T}}(\boldsymbol{x}-\boldsymbol{y}) > 0, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f
(∇f(x)−∇f(y))T(x−y)>0,∀x,y∈domf
推论3.2
( ∇ f ( x ) − ∇ f ( y ) ) T ( x − y ) ⩾ m ∥ y − x ∥ 2 , ∀ x , y ∈ dom f (\nabla f(\boldsymbol{x})-\nabla f(\boldsymbol{y}))^{\mathrm{T}}(\boldsymbol{x}-\boldsymbol{y}) \geqslant m\Vert \boldsymbol{y}-\boldsymbol{x}\Vert^2, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f (∇f(x)−∇f(y))T(x−y)⩾m∥y−x∥2,∀x,y∈domf
定理4
设
f
f
f为定义在凸集上的二阶连续可微函数,则
f
f
f是凸函数当且仅当
∇
2
f
(
x
)
⪰
0
,
∀
x
∈
dom
f
\nabla^2 f(\boldsymbol{x})\succeq0,\quad \forall\boldsymbol{x}\in \operatorname{dom} f
∇2f(x)⪰0,∀x∈domf
如果
∇
2
f
(
x
)
≻
0
,
∀
x
∈
dom
f
\nabla^2 f(\boldsymbol{x})\succ 0,\quad \forall\boldsymbol{x}\in \operatorname{dom} f
∇2f(x)≻0,∀x∈domf,则
f
f
f是严格凸函数
证明:
必要性:
假设存在非零向量
v
∈
R
n
\boldsymbol{v}\in\mathbb{R}^{n}
v∈Rn,使得
v
T
∇
2
f
(
x
)
v
<
0
\boldsymbol{v}^T\nabla^2f(\boldsymbol{x})\boldsymbol{v}<0
vT∇2f(x)v<0
f
(
x
+
t
v
)
=
f
(
x
)
+
t
∇
f
(
x
)
T
v
+
t
2
2
v
T
∇
2
f
(
x
)
v
+
o
(
t
2
)
f
(
x
+
t
v
)
−
f
(
x
)
+
t
∇
f
(
x
)
T
v
t
2
=
1
2
v
T
∇
2
f
(
x
)
v
+
o
(
1
)
\begin{aligned} f(\boldsymbol{x}+t\boldsymbol{v})&=f(\boldsymbol{x})+t\nabla f(\boldsymbol{x})^T\boldsymbol{v}+\frac{t^2}{2}\boldsymbol{v}^T\nabla^2f(\boldsymbol{x})\boldsymbol{v}+o(t^2)\\ \frac{f(\boldsymbol{x}+t\boldsymbol{v})-f(\boldsymbol{x})+t\nabla f(\boldsymbol{x})^T\boldsymbol{v}}{t^2}&=\frac{1}{2}\boldsymbol{v}^T\nabla^2f(\boldsymbol{x})\boldsymbol{v}+o(1) \end{aligned}
f(x+tv)t2f(x+tv)−f(x)+t∇f(x)Tv=f(x)+t∇f(x)Tv+2t2vT∇2f(x)v+o(t2)=21vT∇2f(x)v+o(1)
当
t
→
0
+
t\to 0^{+}
t→0+
f
(
x
+
t
v
)
−
f
(
x
)
+
t
∇
f
(
x
)
T
v
t
2
<
0
\frac{f(\boldsymbol{x}+t\boldsymbol{v})-f(\boldsymbol{x})+t\nabla f(\boldsymbol{x})^T\boldsymbol{v}}{t^2}<0
t2f(x+tv)−f(x)+t∇f(x)Tv<0
与定理2矛盾,所以
∇
2
f
(
x
)
⪰
0
\nabla^2 f(\boldsymbol{x})\succeq0
∇2f(x)⪰0
充分性:
∇
2
f
(
x
)
⪰
0
\nabla^2 f(\boldsymbol{x})\succeq0
∇2f(x)⪰0
由
f
(
y
)
=
f
(
x
)
+
∇
f
(
x
)
T
(
y
−
x
)
+
1
2
(
y
−
x
+
t
(
y
−
x
)
)
T
∇
2
f
(
x
)
(
y
−
x
)
f(\boldsymbol{y})=f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^T(\boldsymbol{y}-\boldsymbol{x})+\frac{1}{2}(\boldsymbol{y}-\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))^T\nabla^2f(\boldsymbol{x})(\boldsymbol{y}-\boldsymbol{x})
f(y)=f(x)+∇f(x)T(y−x)+21(y−x+t(y−x))T∇2f(x)(y−x)
其中
t
∈
(
0
,
1
)
t\in(0,1)
t∈(0,1)
于是
f
(
y
)
≥
f
(
x
)
+
∇
f
(
x
)
T
(
y
−
x
)
f(\boldsymbol{y})\ge f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^T(\boldsymbol{y}-\boldsymbol{x})
f(y)≥f(x)+∇f(x)T(y−x)
所以成立
严格凸函数版本的证明也差不多