some notes about《Negative eigenvalues of the hssian in deep neural networks》

Here’s some part Appendix C “optimal step sizes” of[1]:
θt+1=θtαH(θt)1g(θt)(1)\theta_{t+1}=\theta_t-\alpha H(\theta_t)^{-1}g(\theta_t)-----------(1)θt+1​=θt​−αH(θt​)−1g(θt​)−−−−−−−−−−−(1)
α:learning rate,also called step size in [1]\alpha:\text{learning rate,also called step size in [1]}α:learning rate,also called step size in [1]

Quotation from[1]:“if we project the gradient in the basis of eigenvectors,we get:”
g(θ)=Σi=1N[g(θ)Tvi]vi(2)g(\theta)=\Sigma_{i=1}^N[g(\theta)^Tv_i]v_i-----------(2)g(θ)=Σi=1N​[g(θ)Tvi​]vi​−−−−−−−−−−−(2)
I can not understand (2) ,so I design a simple example:

let g(θ)={2312}g(\theta)=\left\{ \begin{matrix} 2 & 3 \\ 1 & 2 \end{matrix} \right\}g(θ)={21​32​}

then its eigen values and eigen vectors are :
λ1=2+3,v1=(3,1)\lambda_{1}=2+\sqrt{3},v_1=(\sqrt{3},1)λ1​=2+3​,v1​=(3​,1)

λ2=23,v2=(3,1)\lambda_2=2-\sqrt{3},v_2 =(-\sqrt{3},1)λ2​=2−3​,v2​=(−3​,1)

gT(θ)={2132}22g^T(\theta)=\left\{ \begin{matrix} 2 & 1\\ 3 & 2 \end{matrix} \right\}_{2·2}gT(θ)={23​12​}2⋅2​

the dimension of gT(θ)g^T(\theta)gT(θ)is 2x2,which is NOT be compatible with v1v_1v1​and v2v_2v2​,so
[gT(θ)vi]vi[g^T(\theta)v_i]v_i[gT(θ)vi​]vi​ in (2) can NOT be computed,

Could you tell me where am I wrong?
Thanks~~!

Reference:
[1]Negative eigenvalues of the hssian in deep neural networks

上一篇:maskrcnn-benchmark错误:ImportError: cannot import name rnn_compat


下一篇:【转】 前端笔记之JavaScript面向对象(一)Object&函数上下文&构造函数&原型链