1 预备知识

2024-01-16 17:10:22

2.1、张量初始化

import torch

几种初始化方式

torch.zeros((3,4)), torch.ones((1,2)), torch.tensor([[1,2,3], [4,54,5]]), torch.randn(5,6)

2.1.2、张量元素操作

1.对于应常数的+，-，乘，除, 张量对应元素位置进行加减乘除即可
2.使用== 和 > 和 < 比较各元素大小

a = torch.tensor([[1.0,2], [3,4]])
b = 4
c = torch.randn((2,2))
d = torch.randn(2,2)
a, b, c, a + b, a + c, a - b, a - c, a * b, a * c, a / b, a / c, torch.exp(a), d, a==d, a > d, a < d

2.1.3、广播机制

和numpy广播机制相同，前提条件是：设张量a和b,需满足a和b维度一致，然后a,b后面维度值相等或者某一个为1

不相等的为1

a = torch.arange(12).reshape((3,1,4))
b = torch.arange(3).reshape((3,1,1))

后面维度相等

c = torch.arange(4).reshape((1,1,4))

a, b, a+b, c, a+c

2.1.4、张量数据获取/切片,链接

和python获取数组切片类似

a = torch.arange(12).reshape((3,4))
b = torch.randn((3,4))
a[1], a[-1], a[1:], a[0:1,:], a[0:1, 0:1], a[1,1], b,torch.cat((a, b), dim=1),torch.cat((a, b), dim=0)

2.1.5、存储空间

Y = Y + X操作默认会在新的空间保存数据。如果像在原来位置保存数据，可以使用 X[:] = X + Y or X += Y达到减少内存分配效果

a = torch.arange(12).reshape((3,4))
a_p = id(a)
b = torch.randn((3,4))
a = a + b
b_p = id(b)

b[:] = a+b

b += a

a_p == id(a), b_p == id(b),

2.1.6、转numpy的ndarray和常数

a = torch.arange(12).reshape((3,4))

b = a.numpy()
id(a) == id(b), a[1,3:].item(), int(a[1,2])

2.1.7、pandas使用

pandas的dataframe值a转为torch.tensor方式为 torch.tensor(a.values)

2.1.8、线性代数

2.1.8.1、标量,向量，矩阵

标量就是常数，向量是标量组成的数组，矩阵是向量组成的数组

a = torch.tensor(5)
b = torch.tensor(6)
c = torch.arange(10)
d = torch.arange(12).reshape((3,4))

a, b, a+b, a - b, a * b, a / b, a//b, a%b, c, c[3], d, d.T

向量获取张量

2.1.8.2 张量

张量是比矩阵更广泛的定义，矩阵是2维张量

张量的元素+-乘除运算同前面章节，除此之外还有sum和mean计算，可以指定特定的维度axis,指定的那个维度消失；也可以通过参数控制保留该维度；可以指定多个维度，如果不指定，为全部元素计算

a = torch.arange(12, dtype=torch.float32).reshape(2, 3, 2)
b = torch.arange(12).reshape(2, 3, 2)

a, a.sum(), a.sum(axis= 0), a.sum(axis=0, keepdims=True), a.sum(axis=[0,1]), a.mean(axis= 0)

2.1.8.2.1 张量点乘

**向量点乘为各元素相乘相加(一个向量为w，一个向量为input,计算各input不同权重下的最终值)
矩阵点乘向量为向量的维度变化（矩阵为W，向量为input，计算input经过神经网络W，得到输出W行数）；
矩阵点乘矩阵为多个向量的维度变化，可以理解过batch个input的矩阵点乘向量变化

x = torch.arange(4, dtype=torch.float32)
y = torch.ones(4, dtype=torch.float32)
A = torch.arange(20, dtype=torch.float32).reshape(5, 4)
B = torch.ones(4, 3)
x, y, torch.dot(x, y), A.shape, x.shape, torch.mv(A, x),torch.mm(A, B)

2.1.9 范数

范数想表示的是矩阵的大小
常用的L1范数为矩阵各元素绝对值之后求和,L2范数为各元素平方和开根号

u = torch.tensor([3.0, -4.0])
torch.abs(u).sum(), torch.norm(u)

2.5 自动求导

深度学习框架会有自动求导功能。根据我们设计的计算图，自动求导用于计算反向梯度

2.5.1 标量自动求导（y为标量）

import torch

x = torch.arange(4.0)
x.requires_grad_(True)
y = 2 * torch.dot(x, x) # 向量的矩阵积为对应位置相乘，最后相加；所以y为标量
y.backward()

x各位置偏导数为4 * x

x.grad, x.grad == 4 * x

2.5.2 张量的自动求导（y为张量）

x.grad.zero_()
y = x * x
y.backward(torch.ones(len(x)))
x.grad, x.grad == 2 * x

2.5.3 不自动求导

x.grad.zero_()
y = x * x
u = y.detach()
z = u * x
z.backward(torch.ones(len(x)))
z.backward(torch.ones(len(x)))
x.grad, x.grad == u

x.grad.zero_()

y.backward(torch.ones(len(x)))

x.grad == 2 * x

2.5.4 python流中的求导

def f(a):
b = a * 2
while b.norm() < 1000:
b = b * 2
if b.sum() > 0:
c = b
else:
c = 100 * b
return c

a = torch.randn(size=(), requires_grad=True)
d = f(a)
d.backward()

a, a.grad, a.grad == d/a

2.5.6. Exercises

1、Why is the second derivative much more expensive to compute than the first derivative?
因为1阶导数可以利用计算图反向传播计算；但是二级导数导数没有类似这种计算图

2、After running the function for backpropagation, immediately run it again and see what happens.
报异常，RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

3、In the control flow example where we calculate the derivative of d with respect to a, what would happen if we changed the variable a to a random vector or matrix. At this point, the result of the calculation f(a) is no longer a scalar. What happens to the result? How do we analyze this?
a改为张量/向量，报错RuntimeError: grad can be implicitly created only for scalar outputs
将d.backward()改为d.sum().backward()，a.grad == d/a仍旧成立

4、Redesign an example of finding the gradient of the control flow. Run and analyze the result.

5、Let f(x)=sin(x) . Plot f(x) and df(x)dx , where the latter is computed without exploiting that f′(x)=cos(x) .
见下面