最小二乘学习算法基础

Least Squares
Least squares regression is a traditional ML algorithm that minimizes the total square error between samples and learning output, i.e. minimize

JLS(θ)=12i=1n(fθ(xi)yi)2

We’d like to get the estimated θ where JLS is the smallest.
θ^LS=argminθJLS(θ)

Take the linear model as an illustration.
fθ(x)=j=1bθjϕj(x)=θTϕ(x)

Then
JLS(θ)=12Φθy2

where
Φ=ϕ1(x1)ϕ1(xn)ϕb(x1)ϕb(xn)
which is also known as “design matrix”.
It is not necessary to calculate the minimum JLS, but in order to get the corresponding θ, we may take the derivative of JLS(θ), which you have learned in high school.
θJLS=(JLSθ1,,JLSθb)T=ΦTΦθΦTy=0
Then
θ^LS=(ΦTΦ)1ΦTyΦy

where Φ is the general inverse of Φ.
You can also apply weights to the training set:
minθ12i=1nwi(fθ(xi)yi)2
Then
θ^LS=(ΦTWΦ)ΦTWy
.

If we take the kernal model as an example, i.e.

fθ(x)=j=1nθjK(x,xj)
which can also be known as a kind of linear model. For simplicity, We just show the corresponding design matrix
K=K(x1,x1)K(xn,x1)K(x1,xn)K(xn,xn)

Note that least squares algorithm share the property of asymptotic unbiasedness, which means that the noise in y can be removed especially when the expectation of noise equals to zero.

E[θ^LS]=θ.
上一篇:强化学习导论 课后习题参考 - Chapter 9,10


下一篇:机器学习中的常见学习模型