误差反向传播法
计算图
- 用计算图解题
- 构建计算图
- 在计算图上,从左向右进行计算(正向传播)
- 局部计算
- 通过传递"局部计算"获得最终结果.
- 局部计算指无论全局发生了什么,都能只根据与自己相关的信息输出接下来的结果
- 反向传播
- 反向传播传递"局部导数"
链式法则
- 复合函数:由多个函数构成的函数
- 如果某个函数由复合函数表示,则该复合函数的导数可以用构成复合函数的各个函数的导数的乘积表示
反向传播
加法节点的反向传播
-
加法节点的反向传播只乘以1,输入的值原封不动地流向下一个节点
乘法节点的反向传播
- 乘法的反向传播会将上游的值乘以正向传播时的输入信号的"翻转值"后传递给下游.翻转值表示一种翻转关系
实现乘法节点的反向传播时,要保存正向传播的输入信号
简单层的实现
#乘法层的实现
class MulLayer:
def __init__(self):
self.x = None
self.y = None
def forward(self,x,y):
self.x = x
self.y = y
out = x * y
return out
def backward(self,dout):
dx = dout * self.y
dy = dout * self.x
return dx,dy
#加法层的实现
class AddLayer:
def __init__(self):
pass
def forward(self,x,y):
out = x+y
return out
def backward(self,dout):
dx = dout * 1
dy = dout * 1
return dx,dy
#实现购买2个苹果和3个橘子的例子
apple = 100
apple_num = 2
orange = 150
orange_num = 3
tax = 1.1
#layer
mul_apple_layer = MulLayer()
mul_orange_layer = MulLayer()
add_apple_orange_layer = AddLayer()
mul_tax_layer = MulLayer()
#forward
apple_price = mul_apple_layer.forward(apple,apple_num)
orange_price = mul_orange_layer.forward(orange,orange_num)
all_price = add_apple_orange_layer.forward(apple_price,orange_price)
price = mul_tax_layer.forward(all_price,tax)
#backward
dprice = 1
dall_price,dtax = mul_tax_layer.backward(dprice)
dapple_price,dorange_price = add_apple_orange_layer.backward(dall_price)
dorange,dorange_num = mul_orange_layer.backward(dorange_price)
dapple,dapple_num = mul_apple_layer.backward(dapple_price)
print(price)
print(dapple_num,dapple,dorange,dorange_num,dtax)
715.0000000000001
110.00000000000001 2.2 3.3000000000000003 165.0 650
激活函数层的实现
ReLU层
- 正向传播时的输入\(x\)大于0,则反向传播会将上游的值原封不动地传给下游.
- 正向传播时的\(x\)小于等于0,则反向传播中传给下游的信号将停在此处
#ReLu层的实现
#forward和backward的参数为numpy数组
class Relu:
def __init__(self):
self.mask = None
def forward(self,x):
self.mask = (x<=0)
out = x.copy()
out[self.mask]=0
return out
def backward(self,dout):
dout[self.mask]=0
dx = dout
return dx
Sigmoid层
\[ y = \frac{1}{1+exp(-x)} \]
- 正向传播
- "x"节点: \(x*-1=-x\)
- "exp"节点: \(exp(-x)\)
- "+"节点: \(1+exp(-x)\)
- "/"节点: \(y=1/(1+exp(-x))\)
- 反向传播
- "/"节点 \(\begin{aligned}\frac{\partial y}{\partial x} &= -\frac{1}{x^2}\\ &=-y^2\end{aligned}\)
- 反向传播时,将上游的值乘以\(-y^2\),再传给下游
- "/"节点 \(\begin{aligned}\frac{\partial y}{\partial x} &= -\frac{1}{x^2}\\ &=-y^2\end{aligned}\)
- "+"节点将上游的值原封不动地传给下游
- "exp"节点 \(\begin{aligned}\frac{\partial y}{\partial x}&=exp(x)\end{aligned}\)
- "x节点"将正向传播时的值翻转后做乘法运算
- 反向传播简洁版
\[ \begin{aligned} \frac{\partial L}{\partial y}&\to \frac{\partial L}{\partial y}y^2exp(-x)\\ &=\frac{\partial L}{\partial y}\frac{1}{(1+exp(-x))^2}exp(-x)\\ &=\frac{\partial L}{\partial y}\frac{1}{1+exp(-x)}\frac{exp(-x)}{1+exp(-x)}\\ &=\frac{\partial L}{\partial y}y(1-y) \end{aligned} \]
#sigmoid层实现
class Sigmoid:
def __init__(self):
self.out = None
def forward(self,x):
out = 1/(1+np.exp(-x))
self.out = out
return out
def backward(self,dout):
dx = dout*(1.0-self.out)*self.out
return dx
Affine/Softmax层的实现
Affine层
- 神经网络的正向传播中进行的矩阵的乘积运算再几何学领域被称为"仿射变换".
- 进行仿射变换的处理实现为"Affine层"
- 每个节点间传播的是矩阵
#Affine层实现
class Affine:
def __init__(self,W,b):
self.W = W
self.b = b
self.x = None
self.dW = None
self.db = None
def forward(self,x):
self.x = x
out = np.dot(x,self.W)+self.b
return out
def backward(self,dout):
dx = np.dot(dout,self.W.T)
self.dW = np.dot(self.x.t,dout)
self.db = np.sum(dout,axis=0)
return dx
Softmax-with-Loss层
- Softmax层的反向传播得到了(\(y_1-t_1,y_2-t_2,y_3-t_3\))的结果.由于(\(y_1,y_2,y_3)\)是Softmax层的输出,(\(t_1,t_2,t_3\))是监督数据,所以(\(y_1-t_1,y_2-t_2,y_3-t_3\))是Softmax层的输出和监督标签的差分.神经网络的反向传播会把这个差分表示的误差传递给前面的层.这是神经网络学习中的重要性质
from sourcecode.common.functions import cross_entropy_error
#Softmax-with-Loss层的实现
class SoftmaxWithLoss:
def __init__(self):
self.loss = None #损失
self.y = None #softmax的输出
self.t = None #监督数据(ont-hot vector)
def forward(self,x,t):
self.t = t
self.y = softmax(x)
self.loss = cross_entropy_error(self.y,self.t)
return self.loss
def backward(self,dout=1):
batch_size = self.t.shape[0]
dx = (self.y-self.t)/batch_size
return dx#向前传递的是单个数据的误差