Here’s some part Appendix C “optimal step sizes” of[1]:
θt+1=θt−αH(θt)−1g(θt)−−−−−−−−−−−(1)
α:learning rate,also called step size in [1]
Quotation from[1]:“if we project the gradient in the basis of eigenvectors,we get:”
g(θ)=Σi=1N[g(θ)Tvi]vi−−−−−−−−−−−(2)
I can not understand (2) ,so I design a simple example:
let g(θ)={2132}
then its eigen values and eigen vectors are :
λ1=2+3,v1=(3,1)
λ2=2−3,v2=(−3,1)
gT(θ)={2312}2⋅2
the dimension of gT(θ)is 2x2,which is NOT be compatible with v1and v2,so
[gT(θ)vi]vi in (2) can NOT be computed,
Could you tell me where am I wrong?
Thanks~~!
Reference:
[1]Negative eigenvalues of the hssian in deep neural networks