Pytorch Chain-Rules

Derivative Rules

\[\begin{aligned} &\frac{\delta E}{\delta w^1_{jk}}=\frac{\delta E}{\delta O_k^1}\frac{\delta O_k^1}{\delta w^1_{jk}}=\frac{\delta E}{\delta O_k^2}\frac{\delta O_k^2}{\delta O_k^1}\frac{\delta O_k^1}{\delta w^1_{jk}}\\ \end{aligned} \]

import torch, torch.nn.functional as F
x = torch.tensor(1.)
w1, w2 = torch.tensor(2., requires_grad=True), torch.tensor(2., requires_grad=True)
b1, b2 = torch.tensor(1.), torch.tensor(1.)
y1 = x * w1 + b1 
y2 = y1 * w2 +b2 
dy2_dy1 = torch.autograd.grad(y2, [y1], retain_graph=True)[0]
dy1_dw1 = torch.autograd.grad(y1, [w1], retain_graph=True)[0]
dy2_dw1 = torch.autograd.grad(y2, [w1], retain_graph=True)[0]
dy2_dy1 * dy1_dw1
tensor(2.)
dy2_dw1
tensor(2.)
上一篇:从PyTorch中的梯度计算出发谈如何避免训练GAN中出现inplace error


下一篇:深度学习——动态计算图与梯度下降入门