监督学习的步骤:
- 根据随机初始化的参数计算Loss function
- 根据当前的参数与Loss function给出一个梯度信息,根据梯度信息更新模型的参数值
- 不断循环前两个步骤得到最优的Loss值并得到最优参数
Loss
- 估量模型的输出公式与真实值公式之间的差距,给模型的优化指引方向。
- 模型的结构风险包括了经验风险和结构风险,Loss是经验风险函数的核心部分。
- Loss学科:凸优化
常见Loss函数
Classification常用
- 0-1损失函数(zero-one loss)
- 绝对值损失函数
- log对数损失函数
- 平方损失函数
- 指数损失函数(exponential loss)
- Hinge 损失函数
- 感知损失(perceptron loss)函数
- 交叉熵损失函数 (Cross-entropy loss function)
Gradient Descent梯度下降
- 在微积分中,对多元函数的参数求偏导,把求导后的结果(各个参数的偏导数)以向量的形式表达出来就是梯度。
- 梯度的意义:从几何的角度,梯度就是在这一点函数增加最快的方向,反之梯度的相反方向就是函数减小最快的方向。
梯度下降相关概念:
步长:步长决定了在梯度下降迭代的过程中,每一步沿梯度负方向前进的长度。用上面下山的例子,步长就是在当前这一步所在位置沿着最陡峭最易下山的位置走的那一步的长度。
梯度下降的调优: - 算法步长的选择
- 算法参数初始值的选择:选择不同的损失函数
- 归一化:对特征数据归一化。
梯度下降相关算法:
SGD
Momentum
NAG
Adagrad
Adadelta
Rmsprop
Learning Rate学习率
每一次更新参数利用多少误差,就需要通过一个参数来控制,这个参数就是学习率(Learning rate),也称为步长。将输出误差反向传播给网络参数,以此来拟合样本的输出。本质上是最优化的一个过程,逐步趋向于最优解。
Numpy
线性回归
step1 Compute Loss
y=wx+b
def compute_error_for_line_given_points(b,w,points):
totalError = 0
for i in range(0,len(points)):
x = points[i,0] #取第i个点的第一个值
y = points[i,1] #取第i个点的第二个值
totalError += (y - (w*x+b))**2
return totalError / float(len(points))
step2 Compute Gradient and update
def step_gradient(b_current,w_current,points,learningRate):
b_gradient = 0
w_gradient = 0
N = float(len(points))
for i in range(len(points)):
x = points[i,0]
y = pionts[i,1]
b_gradient += (2/N) * ((w_current * x + b_current) - y)
w_gradient += (2/N) * x * ((w_current * x + b_current) - y)
# update w`
new_b = b_current - (learningRate * b_gradient)
new_w = w_current - (learningRate * w_gradient)
return [new_b, new_w]
step3 w = w` and loop
def gradient_descent_runner(points,starting_b,starting_w,learning_rate,num_iterations):
b = starting_b
w = starting_w
#update for several times
for i in range(num_iterations) :
b,w = step_gradient(b, w, np.array(points), learning_rate)
Run
def run():
points = np.genfromtxt("data.csv",delimiter=",")
learning_rate = 0.01
initial_b = 0
initial_w = 0
num_iterations = 1000
print("Starting gradient descent at b = {0},w = {1}, error = {2}"
.format(initial_b,initial_W,
compute_error_for_line_given_points(initial_b, initial_w,points))
)
print("Running...")
[b ,w] = gradient_descent_runner(points, initial_b, initial_w, learning_rate, num_iterations)
print("After {0} iterations b = {1}, w = {2}, error = {3".
format(num_iterations,b,w,
compute_error_for_line_given_points(b, w, points))
)
if __name__ == '__main__':
run()
ALL Code
环境
- python3
- adaconda3
import numpy as np
# y = wx + b
def compute_error_for_line_given_points(b, w, points):
totalError = 0
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
# computer mean-squared-error
totalError += (y - (w * x + b)) ** 2
# average loss for each point
return totalError / float(len(points))
def step_gradient(b_current, w_current, points, learningRate):
b_gradient = 0
w_gradient = 0
N = float(len(points))
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
# grad_b = 2(wx+b-y)
b_gradient += (2/N) * ((w_current * x + b_current) - y)
# grad_w = 2(wx+b-y)*x
w_gradient += (2/N) * x * ((w_current * x + b_current) - y)
# update w'
new_b = b_current - (learningRate * b_gradient)
new_w = w_current - (learningRate * w_gradient)
return [new_b, new_w]
def gradient_descent_runner(points, starting_b, starting_w, learning_rate, num_iterations):
b = starting_b
w = starting_w
# update for several times
for i in range(num_iterations):
b, w = step_gradient(b, w, np.array(points), learning_rate)
return [b, w]
def run():
points = np.genfromtxt("data.csv", delimiter=",")
learning_rate = 0.00001
initial_b = 0 # initial y-intercept guess
initial_w = 0 # initial slope guess
num_iterations = 100000
print("Starting gradient descent at b = {0}, w = {1}, error = {2}"
.format(initial_b, initial_w,
compute_error_for_line_given_points(initial_b, initial_w, points))
)
print("Running...")
[b, w] = gradient_descent_runner(points, initial_b, initial_w, learning_rate, num_iterations)
print("After {0} iterations b = {1}, w = {2}, error = {3}".
format(num_iterations, b, w,
compute_error_for_line_given_points(b, w, points))
)
if __name__ == '__main__':
run()