凸函数学习

凸集

如果过集合 C C C的任意两点的线段都在 C C C内,则称 C C C为凸集,即
x 1 , x 2 ∈ C ⇒ ∀ θ ∈ [ 0 , 1 ] , θ x 1 + ( 1 − θ ) x 2 ∈ C \boldsymbol{x}_1,\boldsymbol{x}_2\in C \Rightarrow \forall \theta\in \left[0,1\right],\theta \boldsymbol{x}_1+(1-\theta)\boldsymbol{x}_2\in C x1​,x2​∈C⇒∀θ∈[0,1],θx1​+(1−θ)x2​∈C

凸函数

凸函数定义

如果 d o m f \bold{dom} f domf是凸集,且
f ( θ x + ( 1 − θ ) y ) ≤ θ f ( x ) + ( 1 − θ ) f ( y ) f(\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y})\le \theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y}) f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y)
对于所有 x , y ∈ d o m f , 0 ≤ θ ≤ 1 \boldsymbol{x},\boldsymbol{y}\in \bold{dom}f,0\le\theta \le 1 x,y∈domf,0≤θ≤1都成立,则称 f f f为凸函数

严格凸函数定义

如果 d o m f \bold{dom} f domf是凸集,且
f ( θ x + ( 1 − θ ) y ) < θ f ( x ) + ( 1 − θ ) f ( y ) f(\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y})< \theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y}) f(θx+(1−θ)y)<θf(x)+(1−θ)f(y)
对于所有 x , y ∈ d o m f , x ≠ y , 0 < θ < 1 \boldsymbol{x},\boldsymbol{y}\in \bold{dom}f,\boldsymbol{x}\neq \boldsymbol{y},0<\theta < 1 x,y∈domf,x​=y,0<θ<1都成立,则称 f f f为严格凸函数

强凸函数定义

若 ∃ m > 0 \exists m>0 ∃m>0,使得
g ( x ) = f ( x ) − m 2 ∥ x ∥ 2 g(\boldsymbol{x})=f(\boldsymbol{x})-\frac{m}{2}\Vert x\Vert^2 g(x)=f(x)−2m​∥x∥2
为凸函数,则称 f ( x ) f(\boldsymbol{x}) f(x)为强凸函数,其中 m m m为强凸参数
f ( θ x + ( 1 − θ ) y ) = g ( θ x + ( 1 − θ ) y ) + m 2 ∥ θ x + ( 1 − θ ) y ∥ 2 ≤ θ g ( x ) + ( 1 − θ ) g ( y ) + m 2 ∥ θ x + ( 1 − θ ) y ∥ 2 = θ f ( x ) − m 2 θ ∥ x ∥ 2 + ( 1 − θ ) f ( y ) − m 2 ( 1 − θ ) ∥ y ∥ 2 + m 2 ∥ θ x + ( 1 − θ ) y ∥ 2 = θ f ( x ) + ( 1 − θ ) f ( y ) + m 2 ( ∥ θ x + ( 1 − θ ) y ∥ 2 − θ ∥ x ∥ 2 − ( 1 − θ ) ∥ y ∥ 2 ) = θ f ( x ) + ( 1 − θ ) f ( y ) + m 2 ( ∑ ( θ x i + ( 1 − θ ) y i ) 2 − θ ∑ x i 2 − ( 1 − θ ) ∑ y i 2 ) = θ f ( x ) + ( 1 − θ ) f ( y ) + m 2 ( θ ( θ − 1 ) ∑ x i 2 − ( 1 − θ ) θ ∑ y i 2 + 2 θ ( 1 − θ ) ∑ x i y i ) = θ f ( x ) + ( 1 − θ ) f ( y ) − m 2 θ ( 1 − θ ) ( ∑ x i 2 + ∑ y i 2 − 2 ∑ x i y i ) = θ f ( x ) + ( 1 − θ ) f ( y ) − m 2 θ ( 1 − θ ) ∥ x − y ∥ 2 \begin{aligned} &\quad f(\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y})\\ &=g(\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y})+\frac{m}{2}\Vert \theta \boldsymbol{x}+(1-\theta)\boldsymbol{y}\Vert^2\\ &\le \theta g(\boldsymbol{x})+(1-\theta)g(\boldsymbol{y})+\frac{m}{2}\Vert \theta \boldsymbol{x}+(1-\theta)\boldsymbol{y}\Vert^2\\ &=\theta f(\boldsymbol{x})-\frac{m}{2}\theta\Vert \boldsymbol{x}\Vert^2+(1-\theta)f(\boldsymbol{y})-\frac{m}{2}(1-\theta)\Vert \boldsymbol{y}\Vert^2+\frac{m}{2}\Vert \theta \boldsymbol{x}+(1-\theta)\boldsymbol{y}\Vert^2\\ &=\theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})+\frac{m}{2}(\Vert \theta \boldsymbol{x}+(1-\theta)\boldsymbol{y}\Vert^2-\theta\Vert \boldsymbol{x}\Vert^2-(1-\theta)\Vert \boldsymbol{y}\Vert^2)\\ &=\theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})+\frac{m}{2}(\sum (\theta x_i+(1-\theta)y_i)^2-\theta\sum x_i^2-(1-\theta)\sum y_i^2)\\ &=\theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})+\frac{m}{2}(\theta(\theta-1)\sum x_i^2-(1-\theta)\theta\sum y_i^2+2\theta(1-\theta)\sum x_iy_i)\\ &=\theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})-\frac{m}{2}\theta(1-\theta)(\sum x_i^2+\sum y_i^2-2\sum x_iy_i)\\ &=\theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})-\frac{m}{2}\theta(1-\theta)\Vert \boldsymbol{x}-\boldsymbol{y}\Vert^2 \end{aligned} ​f(θx+(1−θ)y)=g(θx+(1−θ)y)+2m​∥θx+(1−θ)y∥2≤θg(x)+(1−θ)g(y)+2m​∥θx+(1−θ)y∥2=θf(x)−2m​θ∥x∥2+(1−θ)f(y)−2m​(1−θ)∥y∥2+2m​∥θx+(1−θ)y∥2=θf(x)+(1−θ)f(y)+2m​(∥θx+(1−θ)y∥2−θ∥x∥2−(1−θ)∥y∥2)=θf(x)+(1−θ)f(y)+2m​(∑(θxi​+(1−θ)yi​)2−θ∑xi2​−(1−θ)∑yi2​)=θf(x)+(1−θ)f(y)+2m​(θ(θ−1)∑xi2​−(1−θ)θ∑yi2​+2θ(1−θ)∑xi​yi​)=θf(x)+(1−θ)f(y)−2m​θ(1−θ)(∑xi2​+∑yi2​−2∑xi​yi​)=θf(x)+(1−θ)f(y)−2m​θ(1−θ)∥x−y∥2​
所以等价定义
若 ∃ m > 0 \exists m>0 ∃m>0,使得 ∀ x , y ∈ d o m f , θ ∈ ( 0 , 1 ) \forall x,y\in \bold{dom}f,\theta\in(0,1) ∀x,y∈domf,θ∈(0,1)

f ( θ x + ( 1 − θ ) y ) ≤ θ f ( x ) + ( 1 − θ ) f ( y ) − m 2 θ ( 1 − θ ) ∥ x − y ∥ 2 f(\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y})\le \theta f(\boldsymbol{x})+(1-\theta)f(\boldsymbol{y})-\frac{m}{2}\theta(1-\theta)\Vert \boldsymbol{x}-\boldsymbol{y}\Vert^2 f(θx+(1−θ)y)≤θf(x)+(1−θ)f(y)−2m​θ(1−θ)∥x−y∥2
则称 f ( x ) f(\boldsymbol{x}) f(x)为强凸函数,其中 m m m为强凸参数

凸函数判定

定理1

f ( x ) f(\boldsymbol{x}) f(x)是凸函数当且仅当 ∀ x ∈ d o m   f , v ∈ R n , g : R → R , \forall \boldsymbol{x} \in \bold{dom}\ f,\boldsymbol{v}\in\mathbb{R}^n,g:\mathbb{R}\to \mathbb{R}, ∀x∈dom f,v∈Rn,g:R→R,
g ( t ) = f ( x + t v ) , d o m   g = { t ∣ x + t v ∈ d o m   f } g(t)=f(\boldsymbol{x}+t\boldsymbol{v}),\bold{dom}\ g=\{t\mid\boldsymbol{x}+t\boldsymbol{v}\in\bold{dom}\ f\} g(t)=f(x+tv),dom g={t∣x+tv∈dom f}
是凸函数
证明:
必要性:设 f ( x ) f(\boldsymbol{x}) f(x)是凸函数
∀ t 1 , t 2 ∈ d o m   g , θ ∈ ( 0 , 1 ) \forall t_1,t_2\in \bold{dom}\ g,\theta\in(0,1) ∀t1​,t2​∈dom g,θ∈(0,1)
x + t 1 v ∈ d o m   f x + t 2 v ∈ d o m   f \boldsymbol{x}+t_1\boldsymbol{v}\in\bold{dom}\ f\\ \boldsymbol{x}+t_2\boldsymbol{v}\in\bold{dom}\ f\\ x+t1​v∈dom fx+t2​v∈dom f
由 d o m   f \bold{dom}\ f dom f是凸集,立即推
x + ( θ t 1 + ( 1 − θ ) t 2 ) v ∈ d o m   f \boldsymbol{x}+(\theta t_1+(1-\theta)t_2)\boldsymbol{v}\in\bold{dom}\ f x+(θt1​+(1−θ)t2​)v∈dom f
所以 θ t 1 + ( 1 − θ ) t 2 ∈ d o m   g \theta t_1+(1-\theta)t_2\in \bold{dom}\ g θt1​+(1−θ)t2​∈dom g,即 d o m   g \bold{dom}\ g dom g为凸集
g ( θ t 1 + ( 1 − θ ) t 2 ) = f ( x + ( θ t 1 + ( 1 − θ ) t 2 ) v ) = f ( θ ( x + t 1 v ) + ( 1 − θ ) ( x + t 2 v ) ) ⩽ θ f ( x + t 1 v ) + ( 1 − θ ) f ( x + t 2 v ) = θ g ( t 1 ) + ( 1 − θ ) g ( t 2 ) . \begin{aligned} g\left(\theta t_{1}+(1-\theta) t_{2}\right) &=f\left(\boldsymbol{x}+\left(\theta t_{1}+(1-\theta) t_{2}\right) \boldsymbol{v}\right) \\ &=f\left(\theta\left(\boldsymbol{x}+t_{1} \boldsymbol{v}\right)+(1-\theta)\left(\boldsymbol{x}+t_{2} \boldsymbol{v}\right)\right) \\ & \leqslant \theta f\left(\boldsymbol{x}+t_{1} \boldsymbol{v}\right)+(1-\theta) f\left(\boldsymbol{x}+t_{2} \boldsymbol{v}\right) \\ &=\theta g\left(t_{1}\right)+(1-\theta) g\left(t_{2}\right) . \end{aligned} g(θt1​+(1−θ)t2​)​=f(x+(θt1​+(1−θ)t2​)v)=f(θ(x+t1​v)+(1−θ)(x+t2​v))⩽θf(x+t1​v)+(1−θ)f(x+t2​v)=θg(t1​)+(1−θ)g(t2​).​
所以 g ( t ) g(t) g(t)是凸函数

充分性:
取 v = y − x , t 1 = 0 , t 2 = 1 \boldsymbol{v}=\boldsymbol{y}-\boldsymbol{x},t_1=0,t_2=1 v=y−x,t1​=0,t2​=1
由 d o m   g \bold{dom}\ g dom g是凸集可知, θ ⋅ 0 + ( 1 − θ ) ⋅ 1 ∈ d o m   g \theta\cdot 0+(1-\theta)\cdot 1\in \bold{dom}\ g θ⋅0+(1−θ)⋅1∈dom g
即 θ x + ( 1 − θ ) y ∈ d o m   f \theta \boldsymbol{x}+(1-\theta)\boldsymbol{y}\in\bold{dom}\ f θx+(1−θ)y∈dom f是凸集
g ( 1 − θ ) = g ( θ t 1 + ( 1 − θ ) t 2 ) ⩽ θ g ( t 1 ) + ( 1 − θ ) g ( t 2 ) = θ g ( 0 ) + ( 1 − θ ) g ( 1 ) = θ f ( x ) + ( 1 − θ ) f ( y ) \begin{aligned} g(1-\theta) &=g\left(\theta t_{1}+(1-\theta) t_{2}\right) \\ & \leqslant \theta g\left(t_{1}\right)+(1-\theta) g\left(t_{2}\right) \\ &=\theta g(0)+(1-\theta) g(1) \\ &=\theta f(x)+(1-\theta) f(y) \end{aligned} g(1−θ)​=g(θt1​+(1−θ)t2​)⩽θg(t1​)+(1−θ)g(t2​)=θg(0)+(1−θ)g(1)=θf(x)+(1−θ)f(y)​
g ( 1 − θ ) = f ( x + ( 1 − θ ) ( y − x ) ) = f ( θ x + ( 1 − θ ) y ) g(1-\theta)=f( \boldsymbol{x}+(1-\theta)(\boldsymbol{y}- \boldsymbol{x}))=f(\theta \boldsymbol{x}+(1-\theta)\boldsymbol{y}) g(1−θ)=f(x+(1−θ)(y−x))=f(θx+(1−θ)y)
所以 f ( x ) f( \boldsymbol{x}) f(x)是凸函数

定理2

定义在凸集上的可微函数 f f f, f f f是凸函数,当且仅当
f ( y ) ⩾ f ( x ) + ∇ f ( x ) T ( y − x ) , ∀ x , y ∈ dom ⁡ f f(\boldsymbol{y}) \geqslant f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x}), \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f f(y)⩾f(x)+∇f(x)T(y−x),∀x,y∈domf
证明:
必要性:
设 f f f是凸函数,则 ∀ x , y ∈ dom ⁡ f \forall \boldsymbol{x},\boldsymbol{y}\in \operatorname{dom} f ∀x,y∈domf,以及 t ∈ ( 0 , 1 ) t\in(0,1) t∈(0,1),有
t f ( y ) + ( 1 − t ) f ( x ) ⩾ f ( x + t ( y − x ) ) f ( y ) − f ( x ) ⩾ f ( x + t ( y − x ) ) − f ( x ) t \begin{aligned} t f(\boldsymbol{y})+(1-t) f(\boldsymbol{x}) &\geqslant f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))\\ f(\boldsymbol{y})-f(\boldsymbol{x}) &\geqslant \frac{f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))-f(\boldsymbol{x})}{t} \end{aligned} tf(y)+(1−t)f(x)f(y)−f(x)​⩾f(x+t(y−x))⩾tf(x+t(y−x))−f(x)​​
令 t → 0 t\to 0 t→0,利用保号性
f ( y ) − f ( x ) ⩾ lim ⁡ t → 0 f ( x + t ( y − x ) ) − f ( x ) t = ∇ f ( x ) T ( y − x ) f(\boldsymbol{y})-f(\boldsymbol{x}) \geqslant \lim _{t \rightarrow 0} \frac{f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))-f(\boldsymbol{x})}{t}=\nabla f(\boldsymbol{x})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x}) f(y)−f(x)⩾t→0lim​tf(x+t(y−x))−f(x)​=∇f(x)T(y−x)

充分性:
设 f f f是凸函数,则 ∀ x , y ∈ dom ⁡ f \forall \boldsymbol{x},\boldsymbol{y}\in \operatorname{dom} f ∀x,y∈domf,以及 t ∈ ( 0 , 1 ) t\in(0,1) t∈(0,1)
设 z = t x + ( 1 − t ) y \boldsymbol{z}=t\boldsymbol{x}+(1-t)\boldsymbol{y} z=tx+(1−t)y
f ( x ) ⩾ f ( z ) + ∇ f ( z ) T ( x − z ) f ( y ) ⩾ f ( z ) + ∇ f ( z ) T ( y − z ) \begin{aligned} &f(\boldsymbol{x}) \geqslant f(\boldsymbol{z})+\nabla f(\boldsymbol{z})^{\mathrm{T}}(\boldsymbol{x}-\boldsymbol{z}) \\ &f(\boldsymbol{y}) \geqslant f(z)+\nabla f(\boldsymbol{z})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{z}) \end{aligned} ​f(x)⩾f(z)+∇f(z)T(x−z)f(y)⩾f(z)+∇f(z)T(y−z)​
于是
t f ( x ) + ( 1 − t ) f ( y ) ⩾ f ( z ) + 0 = f ( t x + ( 1 − t ) y ) t f(\boldsymbol{x})+(1-t) f(\boldsymbol{y}) \geqslant f(\boldsymbol{z})+0=f(t\boldsymbol{x}+(1-t)\boldsymbol{y}) tf(x)+(1−t)f(y)⩾f(z)+0=f(tx+(1−t)y)

推论2.1

定义在凸集上的可微函数 f f f, f f f是严格凸函数,当且仅当
f ( y ) > f ( x ) + ∇ f ( x ) T ( y − x ) , ∀ x , y ∈ dom ⁡ f f(\boldsymbol{y}) > f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x}), \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f f(y)>f(x)+∇f(x)T(y−x),∀x,y∈domf

推论2.2

定义在凸集上的可微函数 f f f, f f f是强凸函数,当且仅当
f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) + m 2 ∥ y − x ∥ 2 , ∀ x , y ∈ dom ⁡ f f(\boldsymbol{y}) \ge f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x})+\frac{m}{2}\Vert \boldsymbol{y}-\boldsymbol{x}\Vert^2, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f f(y)≥f(x)+∇f(x)T(y−x)+2m​∥y−x∥2,∀x,y∈domf

定理3

设 f f f是可微函数,则 f f f是凸函数当且仅当 d o m   f \bold{dom}\ f dom f为凸集且 ∇ f \nabla f ∇f为单调映射,即
( ∇ f ( x ) − ∇ f ( y ) ) T ( x − y ) ⩾ 0 , ∀ x , y ∈ dom ⁡ f (\nabla f(\boldsymbol{x})-\nabla f(\boldsymbol{y}))^{\mathrm{T}}(\boldsymbol{x}-\boldsymbol{y}) \geqslant 0, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f (∇f(x)−∇f(y))T(x−y)⩾0,∀x,y∈domf
证明:
必要性:
f ( y ) ⩾ f ( x ) + ∇ f ( x ) T ( y − x ) f ( x ) ⩾ f ( y ) + ∇ f ( y ) T ( x − y ) \begin{aligned} &f(\boldsymbol{y}) \geqslant f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x}) \\ &f(\boldsymbol{x}) \geqslant f(\boldsymbol{y})+\nabla f(\boldsymbol{y})^{\mathrm{T}}(\boldsymbol{x}-\boldsymbol{y}) \end{aligned} ​f(y)⩾f(x)+∇f(x)T(y−x)f(x)⩾f(y)+∇f(y)T(x−y)​
相加得
( ∇ f ( x ) − ∇ f ( y ) ) T ( x − y ) ⩾ 0 , ∀ x , y ∈ dom ⁡ f (\nabla f(\boldsymbol{x})-\nabla f(\boldsymbol{y}))^{\mathrm{T}}(\boldsymbol{x}-\boldsymbol{y}) \geqslant 0, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f (∇f(x)−∇f(y))T(x−y)⩾0,∀x,y∈domf

充分性:
设 g ( t ) = f ( x + t ( y − x ) ) , g ′ ( t ) = ∇ f ( x + t ( y − x ) ) T ( y − x ) g(t)=f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x})), \quad g^{\prime}(t)=\nabla f(\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x}) g(t)=f(x+t(y−x)),g′(t)=∇f(x+t(y−x))T(y−x)
由 ∇ f \nabla f ∇f单调映射, ∀ t > 0 , g ′ ( t ) ≥ g ′ ( 0 ) \forall t>0,g'(t)\ge g'(0) ∀t>0,g′(t)≥g′(0)
f ( y ) = g ( 1 ) = g ( 0 ) + ∫ 0 1 g ′ ( t ) d t ⩾ g ( 0 ) + g ′ ( 0 ) = f ( x ) + ∇ f ( x ) T ( y − x ) \begin{aligned} f(y) &=g(1)=g(0)+\int_{0}^{1} g^{\prime}(t) \mathrm{d} t \\ & \geqslant g(0)+g^{\prime}(0)\\ &=f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^{\mathrm{T}}(\boldsymbol{y}-\boldsymbol{x}) \end{aligned} f(y)​=g(1)=g(0)+∫01​g′(t)dt⩾g(0)+g′(0)=f(x)+∇f(x)T(y−x)​

推论3.1

f f f是严格凸函数当且仅当
( ∇ f ( x ) − ∇ f ( y ) ) T ( x − y ) > 0 , ∀ x , y ∈ dom ⁡ f (\nabla f(\boldsymbol{x})-\nabla f(\boldsymbol{y}))^{\mathrm{T}}(\boldsymbol{x}-\boldsymbol{y}) > 0, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f (∇f(x)−∇f(y))T(x−y)>0,∀x,y∈domf

推论3.2

( ∇ f ( x ) − ∇ f ( y ) ) T ( x − y ) ⩾ m ∥ y − x ∥ 2 , ∀ x , y ∈ dom ⁡ f (\nabla f(\boldsymbol{x})-\nabla f(\boldsymbol{y}))^{\mathrm{T}}(\boldsymbol{x}-\boldsymbol{y}) \geqslant m\Vert \boldsymbol{y}-\boldsymbol{x}\Vert^2, \quad \forall \boldsymbol{x}, \boldsymbol{y} \in \operatorname{dom} f (∇f(x)−∇f(y))T(x−y)⩾m∥y−x∥2,∀x,y∈domf

定理4

设 f f f为定义在凸集上的二阶连续可微函数,则 f f f是凸函数当且仅当
∇ 2 f ( x ) ⪰ 0 , ∀ x ∈ dom ⁡ f \nabla^2 f(\boldsymbol{x})\succeq0,\quad \forall\boldsymbol{x}\in \operatorname{dom} f ∇2f(x)⪰0,∀x∈domf
如果 ∇ 2 f ( x ) ≻ 0 , ∀ x ∈ dom ⁡ f \nabla^2 f(\boldsymbol{x})\succ 0,\quad \forall\boldsymbol{x}\in \operatorname{dom} f ∇2f(x)≻0,∀x∈domf,则 f f f是严格凸函数
证明:
必要性:
假设存在非零向量 v ∈ R n \boldsymbol{v}\in\mathbb{R}^{n} v∈Rn,使得 v T ∇ 2 f ( x ) v < 0 \boldsymbol{v}^T\nabla^2f(\boldsymbol{x})\boldsymbol{v}<0 vT∇2f(x)v<0
f ( x + t v ) = f ( x ) + t ∇ f ( x ) T v + t 2 2 v T ∇ 2 f ( x ) v + o ( t 2 ) f ( x + t v ) − f ( x ) + t ∇ f ( x ) T v t 2 = 1 2 v T ∇ 2 f ( x ) v + o ( 1 ) \begin{aligned} f(\boldsymbol{x}+t\boldsymbol{v})&=f(\boldsymbol{x})+t\nabla f(\boldsymbol{x})^T\boldsymbol{v}+\frac{t^2}{2}\boldsymbol{v}^T\nabla^2f(\boldsymbol{x})\boldsymbol{v}+o(t^2)\\ \frac{f(\boldsymbol{x}+t\boldsymbol{v})-f(\boldsymbol{x})+t\nabla f(\boldsymbol{x})^T\boldsymbol{v}}{t^2}&=\frac{1}{2}\boldsymbol{v}^T\nabla^2f(\boldsymbol{x})\boldsymbol{v}+o(1) \end{aligned} f(x+tv)t2f(x+tv)−f(x)+t∇f(x)Tv​​=f(x)+t∇f(x)Tv+2t2​vT∇2f(x)v+o(t2)=21​vT∇2f(x)v+o(1)​
当 t → 0 + t\to 0^{+} t→0+
f ( x + t v ) − f ( x ) + t ∇ f ( x ) T v t 2 < 0 \frac{f(\boldsymbol{x}+t\boldsymbol{v})-f(\boldsymbol{x})+t\nabla f(\boldsymbol{x})^T\boldsymbol{v}}{t^2}<0 t2f(x+tv)−f(x)+t∇f(x)Tv​<0
与定理2矛盾,所以 ∇ 2 f ( x ) ⪰ 0 \nabla^2 f(\boldsymbol{x})\succeq0 ∇2f(x)⪰0

充分性: ∇ 2 f ( x ) ⪰ 0 \nabla^2 f(\boldsymbol{x})\succeq0 ∇2f(x)⪰0

f ( y ) = f ( x ) + ∇ f ( x ) T ( y − x ) + 1 2 ( y − x + t ( y − x ) ) T ∇ 2 f ( x ) ( y − x ) f(\boldsymbol{y})=f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^T(\boldsymbol{y}-\boldsymbol{x})+\frac{1}{2}(\boldsymbol{y}-\boldsymbol{x}+t(\boldsymbol{y}-\boldsymbol{x}))^T\nabla^2f(\boldsymbol{x})(\boldsymbol{y}-\boldsymbol{x}) f(y)=f(x)+∇f(x)T(y−x)+21​(y−x+t(y−x))T∇2f(x)(y−x)
其中 t ∈ ( 0 , 1 ) t\in(0,1) t∈(0,1)
于是
f ( y ) ≥ f ( x ) + ∇ f ( x ) T ( y − x ) f(\boldsymbol{y})\ge f(\boldsymbol{x})+\nabla f(\boldsymbol{x})^T(\boldsymbol{y}-\boldsymbol{x}) f(y)≥f(x)+∇f(x)T(y−x)
所以成立

严格凸函数版本的证明也差不多

上一篇:Machine Learning Week_1 Parameter Learning 1-6


下一篇:[AIZU]AIZU - 1146 POJ - 3011 Secrets in Shadows 计算几何