关于近端梯度下降的理解

近端梯度下降

目标函数

min x Q ( x ) = f ( x ) + g ( x ) (1) \textit{min}_{x}Q(x)=f(x)+g(x) \tag {1} minx​Q(x)=f(x)+g(x)(1)
f ( x ) f(x) f(x)为可微凸函数, g ( x ) g(x) g(x)非光滑, x ∈ H x\in{H} x∈H。

考虑 f ( x ) f(x) f(x)

梯度下降
x k + 1 = x k − α ∇ f ( x ) (2) x_{k+1}=x_{k}-\alpha\nabla{f(x)} \tag {2} xk+1​=xk​−α∇f(x)(2)

f ( x ) f(x) f(x)在 x k x_{k} xk​处泰勒展开
f ( x k + 1 ) ≈ f ( x k ) + ∇ f ( x k ) ( x k + 1 − x k ) (3) f(x_{k+1}) \approx f(x_{k})+\nabla{f(x_{k})}(x_{k+1}-x_{k}) \tag {3} f(xk+1​)≈f(xk​)+∇f(xk​)(xk+1​−xk​)(3)

式(2)+(3)
f ( x k + 1 ) = f ( x k ) − α ( ∇ f ( x k ) ) 2 f ( x k + 1 ) < f ( x k ) (4) \begin{aligned} f(x_{k+1})=f(x_{k})-\alpha(\nabla{f(x_{k})})^{2} \\ f(x_{k+1})<f(x_{k}) \tag {4} \end{aligned} f(xk+1​)=f(xk​)−α(∇f(xk​))2f(xk+1​)<f(xk​)​(4)

f ( x ) f(x) f(x)二阶泰勒展开
f ^ ( x ; x k ) ≈ f ( x k ) + ∇ f ( x k ) ( x − x k ) + ∇ 2 f ( x k ) 2 ( x − x k ) 2 (5) \widehat{f}(x;x_{k}) \approx f(x_{k})+\nabla{f(x_{k})} (x-x_{k})+ \frac{\nabla^2 {f(x_{k})}}{2} (x-x_{k})^2 \tag {5} f ​(x;xk​)≈f(xk​)+∇f(xk​)(x−xk​)+2∇2f(xk​)​(x−xk​)2(5)

∇ f ( x k ) \nabla f(x_{k}) ∇f(xk​)满足L-Lipschitz条件
∣ ∇ f ( x ) − ∇ f ( x k ) ∣ ≤ L ∣ x − x k ∣ ∇ 2 f ( x k ) = ∣ ∇ f ( x ) − ∇ f ( x k ) ∣ ∣ x − x k ∣ ≤ L (6) \begin{aligned} \left\vert \nabla f(x) - \nabla f(x_{k}) \right\vert \leq L\left\vert x - x_{k} \right\vert \\ \nabla^2 {f(x_{k})} = \frac{\left\vert \nabla f(x) - \nabla f(x_{k}) \right\vert}{\left\vert x - x_{k} \right\vert} \leq L \tag {6} \end{aligned} ∣∇f(x)−∇f(xk​)∣≤L∣x−xk​∣∇2f(xk​)=∣x−xk​∣∣∇f(x)−∇f(xk​)∣​≤L​(6)

式(5)+(6)
f ^ ( x ; x k ) ≈ f ( x k ) + ∇ f ( x k ) ( x − x k ) + L 2 ( x − x k ) 2 (7) \widehat{f}(x;x_{k}) \approx f(x_{k})+\nabla{f(x_{k})} (x-x_{k})+ \frac{L}{2} (x-x_{k})^2 \tag {7} f ​(x;xk​)≈f(xk​)+∇f(xk​)(x−xk​)+2L​(x−xk​)2(7)

式(7)配方
f ^ ( x ; x k ) ≈ L 2 ( ( x − x k ) 2 + 2 L ∇ f ( x k ) ( x − x k ) + ( 1 L ∇ f ( x k ) ) 2 ) − 1 2 L ( ∇ f ( x k ) ) 2 + f ( x k ) f ^ ( x ; x k ) ≈ L 2 ( x − x k + 1 L ∇ f ( x k ) ) 2 − 1 2 L ( ∇ f ( x k ) ) 2 + f ( x k ) (8) \begin{aligned} \widehat{f}(x;x_{k}) \approx \frac{L}{2} ((x-x_{k})^2 + \frac{2}{L} \nabla{f(x_{k})} (x-x_{k}) + (\frac{1}{L} \nabla{f(x_{k})})^2) - \frac{1}{2L} (\nabla{f(x_{k})} )^2 +f(x_{k}) \\ \widehat{f}(x;x_{k}) \approx \frac{L}{2} (x-x_{k}+\frac{1}{L} \nabla{f(x_{k})}) ^2- \frac{1}{2L} (\nabla{f(x_{k})} )^2 +f(x_{k}) \tag {8} \end{aligned} f ​(x;xk​)≈2L​((x−xk​)2+L2​∇f(xk​)(x−xk​)+(L1​∇f(xk​))2)−2L1​(∇f(xk​))2+f(xk​)f ​(x;xk​)≈2L​(x−xk​+L1​∇f(xk​))2−2L1​(∇f(xk​))2+f(xk​)​(8)

当 x k + 1 = a r g m i n x f ^ ( x ; x k ) x_{k+1}=arg min_{x} \widehat{f}(x;x_{k}) xk+1​=argminx​f ​(x;xk​)时
x k + 1 = x k + 1 L ∇ f ( x k ) (9) x_{k+1}=x_{k} +\frac{1}{L} \nabla{f(x_{k})} \tag {9} xk+1​=xk​+L1​∇f(xk​)(9)
考虑 g ( x ) g(x) g(x)非光滑
式(1)+(8)
Q ( x ) = L 2 ( x − x k + 1 L ∇ f ( x k ) ) 2 + h ( x k ) + g ( x ) h ( x k ) = f ( x k ) − 1 2 L ( ∇ f ( x k ) ) 2 x k + 1 = a r g m i n x L 2 ( x − x k + 1 L ∇ f ( x k ) ) 2 + g ( x ) (10) \begin{aligned} Q(x)=\frac{L}{2} (x-x_{k}+\frac{1}{L} \nabla{f(x_{k})}) ^2+h(x_{k})+g(x) \\ h(x_{k})=f(x_{k})-\frac{1}{2L} (\nabla{f(x_{k})} )^2 \\ x_{k+1}=arg min_{x} \frac{L}{2} (x-x_{k}+\frac{1}{L} \nabla{f(x_{k})}) ^2+g(x) \tag {10} \end{aligned} Q(x)=2L​(x−xk​+L1​∇f(xk​))2+h(xk​)+g(x)h(xk​)=f(xk​)−2L1​(∇f(xk​))2xk+1​=argminx​2L​(x−xk​+L1​∇f(xk​))2+g(x)​(10)

近端投影
p r o x g ( x ) ( z ) = a r g m i n x 1 2 ( x − z ) 2 + g ( x ) (11) prox_{g(x)}(z) = arg min_{x} \frac{1}{2} (x-z) ^2+g(x) \tag {11} proxg(x)​(z)=argminx​21​(x−z)2+g(x)(11)

注:上述内容仅为个人学习过程中的笔记,如有不当的地方还望指正

上一篇:层次分析法(AHP)原理以及应用


下一篇:UA SIE545 优化理论基础3 Fritz-John与Kuhn-Tucker理论总结 带等式约束与不等式约束的极值问题