神经网络(NN)的复杂度
空间复杂度:
计算神经网络的层数时只统计有运算能力的层,输入层仅仅起到将数据传输进来的作用,没有涉及到运算,所以统计神经网络层数时不算输入层
输入层和输出层之间所有层都叫做隐藏层
层数 = 隐藏层的层数 + 1个输出层
总参数个数 = 总w个数 + 总b个数
时间复杂度:
乘加运算次数
学习率以及参数的更新:
\[{w_{t + 1}} = {w_t} - lr*\frac{{\partial loss}}{{\partial {w_t}}} \]指数衰减学习率的选择及设置
可以先用较大的学习率,快速得到较优解,然后逐步减小学习率,使模型在训练后期稳定。
指数衰减学习率 = 初始学习率 * 学习率衰减率( 当前轮数 / 多少轮衰减一次 )
#学习率衰减
import tensorflow as tf
w = tf.Variable(tf.constant(5, dtype=tf.float32))
epoch = 40
LR_BASE = 0.2
LR_DECAY = 0.99
LR_STEP = 1
for epoch in range(epoch):
lr = LR_BASE * LR_DECAY**(epoch / LR_STEP)
with tf.GradientTape() as tape:
loss = tf.square(w + 1)
grads = tape.gradient(loss, w)
w.assign_sub(lr * grads)
print("After %2s epoch,\tw is %f,\tloss is %f\tlr is %f" % (epoch, w.numpy(), loss,lr))
After 0 epoch, w is 2.600000, loss is 36.000000 lr is 0.200000
After 1 epoch, w is 1.174400, loss is 12.959999 lr is 0.198000
After 2 epoch, w is 0.321948, loss is 4.728015 lr is 0.196020
After 3 epoch, w is -0.191126, loss is 1.747547 lr is 0.194060
After 4 epoch, w is -0.501926, loss is 0.654277 lr is 0.192119
After 5 epoch, w is -0.691392, loss is 0.248077 lr is 0.190198
After 6 epoch, w is -0.807611, loss is 0.095239 lr is 0.188296
After 7 epoch, w is -0.879339, loss is 0.037014 lr is 0.186413
After 8 epoch, w is -0.923874, loss is 0.014559 lr is 0.184549
After 9 epoch, w is -0.951691, loss is 0.005795 lr is 0.182703
After 10 epoch, w is -0.969167, loss is 0.002334 lr is 0.180876
After 11 epoch, w is -0.980209, loss is 0.000951 lr is 0.179068
After 12 epoch, w is -0.987226, loss is 0.000392 lr is 0.177277
After 13 epoch, w is -0.991710, loss is 0.000163 lr is 0.175504
After 14 epoch, w is -0.994591, loss is 0.000069 lr is 0.173749
After 15 epoch, w is -0.996452, loss is 0.000029 lr is 0.172012
After 16 epoch, w is -0.997660, loss is 0.000013 lr is 0.170292
After 17 epoch, w is -0.998449, loss is 0.000005 lr is 0.168589
After 18 epoch, w is -0.998967, loss is 0.000002 lr is 0.166903
After 19 epoch, w is -0.999308, loss is 0.000001 lr is 0.165234
After 20 epoch, w is -0.999535, loss is 0.000000 lr is 0.163581
After 21 epoch, w is -0.999685, loss is 0.000000 lr is 0.161946
After 22 epoch, w is -0.999786, loss is 0.000000 lr is 0.160326
After 23 epoch, w is -0.999854, loss is 0.000000 lr is 0.158723
After 24 epoch, w is -0.999900, loss is 0.000000 lr is 0.157136
After 25 epoch, w is -0.999931, loss is 0.000000 lr is 0.155564
After 26 epoch, w is -0.999952, loss is 0.000000 lr is 0.154009
After 27 epoch, w is -0.999967, loss is 0.000000 lr is 0.152469
After 28 epoch, w is -0.999977, loss is 0.000000 lr is 0.150944
After 29 epoch, w is -0.999984, loss is 0.000000 lr is 0.149434
After 30 epoch, w is -0.999989, loss is 0.000000 lr is 0.147940
After 31 epoch, w is -0.999992, loss is 0.000000 lr is 0.146461
After 32 epoch, w is -0.999994, loss is 0.000000 lr is 0.144996
After 33 epoch, w is -0.999996, loss is 0.000000 lr is 0.143546
After 34 epoch, w is -0.999997, loss is 0.000000 lr is 0.142111
After 35 epoch, w is -0.999998, loss is 0.000000 lr is 0.140690
After 36 epoch, w is -0.999999, loss is 0.000000 lr is 0.139283
After 37 epoch, w is -0.999999, loss is 0.000000 lr is 0.137890
After 38 epoch, w is -0.999999, loss is 0.000000 lr is 0.136511
After 39 epoch, w is -0.999999, loss is 0.000000 lr is 0.135146
激活函数
sigmoid函数
tf.nn. sigmoid(x)
\[f(x) = \frac{1}{{1 + {e^{ - x}}}}
\]
图像:
相当于对输入进行了归一化
- 多层神经网络更新参数时,需要从输出层往输入层方向逐层进行链式求导,而sigmoid函数的导数输出是0到0.25之间的小数,链式求导时,多层导数连续相乘会出现多个0到0.25之间的值连续相乘,结果将趋近于0,产生梯度消失,使得参数无法继续更新
- 且sigmoid函数存在幂运算,计算比较复杂
tanh函数
tf.math. tanh(x)
\[f(x) = \frac{{1 - {e^{ - 2x}}}}{{1 + {e^{ - 2x}}}}
\]
图像:
和上面提到的sigmoid函数一样,同样存在梯度消失和幂运算复杂的缺点
Relu函数
tf.nn.relu(x)
\[f(x)=max(x,0)=0(x<0)时或者x(x>=0)时
\]
图像:
- Relu函数在正区间内解决了梯度消失的问题,使用时只需要判断输入是否大于0,计算速度快
- 训练参数时的收敛速度要快于sigmoid函数和tanh函数
- 输出非0均值,收敛慢
- Dead RelU问题:送入激活函数的输入特征是负数时,激活函数输出是0,反向传播得到的梯度是0,致使参数无法更新,某些神经元可能永远不会被激活。但是可以通过改进随机初始化,避免过多的负数特征送入Relu函数,可以通过设置更小的学习率来减小参数分布的巨大变化,来避免训练过程中产生过多负数特征。
Leaky Relu函数
tf.nn.leaky_relu(x)
\[