规定
- \(y_{ij}\)为第\(i\)层网络第\(j\)个神经元的输出.
- \(t_i\)为输出层第\(i\)个输出.
- \(n_i\)为第\(i\)层网络的神经元数量.
- 激活函数\(\sigma(x)=Sigmod(x)=\frac{1}{1+e^{-x}}\),因此\(\frac{\partial \sigma(x)}{\partial x}=\sigma(x)[1-\sigma(x)]\).
- \(E\)代表误差,即\(E=\sum_{i=1}^{2}(y_{3i}-t_i)^{2}\).
- \(\nabla_{ijk}\)为第\(i\)层网络第\(j\)个神经元,它对上一层网络的第\(k\)个神经元的\(\omega\)的梯度值.
公式推导
现在以\(11\rightarrow 21 \rightarrow 31\)该过程为例,同时导出通项公式。
1.\(net_{ij}\)
\(net_{21}=\sum_{i=1}^{2}(\omega_{21i}y_{1i})\).
\(net_{31}=\sum_{i=1}^{3}(\omega_{31i}y_{2i})\).
因此通项公式为:\(net_{ij}=\sum_{k=1}^{n_{i-1}}(\omega_{ijk}y_{i-1,k})\), \(y_{ij}=\sigma(net_{ij})\).
2.\(y_{ij}\)
\(y_{21}=\sigma(net_{21})\).
\(y_{31}=\sigma(net_{31})\).
因此通项公式为:\(y_{ij}=\sigma(net_{ij})\).
3.误差\(E\)
\(E=\sum_{i=1}^{2}(y_{3i}-t_{i})^2\).
4.\(\nabla_{3ij}\)(输出层各\(\omega\)的梯度值)
\(\frac{\partial E}{\partial net_{31}}=\frac{E}{\partial y_{31}}\cdot \frac{\partial y_{31}}{\partial {net_{31}}}=2(y_{31}-t_1)y_{31}(1-y_{31})\).
\(\therefore \frac{E}{\partial net_{3i}}=2(y_{3i}-t_i)y_{3i}(1-y_{3i})\).
\(\nabla_{311}=\frac{\partial E}{\partial net_{31}} \cdot \frac{\partial net_{31}}{\partial \omega_{311}}=\frac{\partial E}{\partial net_{31}}\cdot y_{21}\).
因此通项公式为:\(\nabla_{3ij}=\frac{\partial E}{\partial net_{3i}}\cdot \frac{\partial net_{3i}}{\partial \omega_{3ij}}=2(y_{3i}-t_{i})y_{3i}(1-y_{3i})y_{2i}\).
5.\(\nabla_{2ij}\)(隐藏层各\(\omega\)的梯度值)
\(\nabla_{211}=\frac{\partial E}{\partial \omega_{211}}=\frac{\partial E}{\partial net_{31}}\cdot \frac{\partial net_{31}}{\partial y_{21}}\cdot \frac{\partial y_{21}}{\partial net_{21}}\cdot \frac{\partial{net_{21}}}{\partial \omega_{211}}+\frac{\partial E}{\partial net_{32}}\cdot \frac{\partial net_{32}}{\partial y_{21}}\cdot \frac{\partial y_{21}}{\partial net_{21}}\cdot \frac{\partial{net_{21}}}{\partial \omega_{211}}\\=\sum_{i=1}^{2}(\frac{\partial E}{\partial net_{3i}}\cdot \frac{\partial net_{3i}}{\partial y_{2i}})\cdot \frac{\partial y_{21}}{\partial net_{21}} \cdot \frac{\partial net_{21}}{\partial \omega_{211}}\\=\sum_{i=1}^{2}(\frac{\partial E}{\partial net_{3i}}\cdot \omega_{3i1})\cdot y_{21}(1-y_{21})\omega_{21}\)
因此通项公式为:\(\nabla_{2ij}=\frac{\partial E}{\partial \omega_{2ij}}=\sum_{k=1}^{2}(\frac{\partial E}{\partial net_{3k}}\cdot \omega_{3ki})\cdot y_{2i}(1-y_{2i})\omega_{2ij}\).