Huber regression
In least square learning methods, we make use of ℓ2 loss to make sure that we get a suitable outcome. However, in the robust point of view, it is always better to make use of the least absolute as the main criterion, i.e.
θ^LA=argminθJLA(θ),JLA(θ)=∑i=1n|ri|
where ri=fθ(xi)−yi is the residual error. By doing so, it is possible to make the learning method more robust at the cost of accuracy.
In order to balance robustness and accuracy, Huber loss may be a good alternative:
ρHuber(r)={ r2/2η|r|−η2/2(|r|≤η)(|r|>η)
Then the optimization goal turns out to be:
minθJ(θ),J(θ)=∑i=1nρHuber(ri)
As usual, take the linear parameterized model as an example:
fθ(x)=∑j=1bθjϕj(x)=θTϕ(x)
For simplicity, we omit the details and give the final outcome (more details needed? refer to ℓ1 constrained LS):
θ^=argminθJ~(θ),J~(θ)=12∑i=1nω~iri+C
where ω~i={1η/|r~i|(|r~i|≤η)(|r~i|>η) and C=∑i:|r~i|>η(η|r~i|/2−η2/2) are independent of θ.
Therefore, the solution can be formulated as:
θ^=(ΦTW~Φ)†ΦTW~y
where W~=diag(ω~1,…,ωn). By iteration, we can solve θ^ as an estimation of θ. The corresponding MATLAB codes are given below:
n=50; N=1000;
x=linspace(-3,3,n)'; X=linspace(-4,4,N)';
y=x+0.2*randn(n,1); y(n)=-4;
p(:,1)=ones(n,1); p(:,2)=x; t0=p\y; e=1;
for o=1:1000
r=abs(p*t0-y); w=ones(n,1); w(r>e)=e./r(r>e);
t=(p'*(repmat(w,1,2).*p))\(p'*(w.*y));
if norm(t-t0)<0.001, break, end
t0=t;
end
P(:,1)=ones(N,1); P(:,2)=X; F=P*t;
figure(1); clf; hold on; axis([-4,4,-4.5,3.5]);
plot(X,F,'g-'); plot(x,y,'bo');
Tukey regression
The Huber loss combined ℓ1 loss and ℓ2 loss to balance robustness and accuracy. Since ℓ1 loss is concerned, the outliers may have an enormous impact on the final outcome. To tackling that, Tukey may be a considerable alternative:
ρTukey(r)=⎧⎩⎨(1−[1−r2/η2]3)η2/6η2/6(|r|≤η)(|r|>η)
Of course, the Tukey loss is not a convex funciton, that is to say, there may be serveral local optimal solution. In actual applications, we apply the following weights:
ω={(1−r2/η2)20(|r|≤η)(|r|>η)
Hence the outliers can no longer put any impact on our estimation.