在深度学习模型中,卷积层绝对是最常用的基本操作,因此学习好卷积操作至关重要。卷积运算是线性变换的一种,而且属于一种稀疏连接的线性变换(不同与全连接的线性变换层,其是稠密连接的线性变换)。
卷积操作的运算涉及两个张量
- 第一个张量是输入张量
- 第二个是线性变换的权重张量(也称为卷积核 or 滤波器)
在 Pytorch 中,卷积操作主要可以分为两类,第一类是正常的卷积操作,第二类为转置卷积。这两类卷积分别有三个子类,即一维卷积、二维卷积 & 三维卷积。卷积核 & 转置卷积 都有一个公共的父类,即 _ConvNd 类,这个类是隐藏的,具体代码在 torch/nn/modules/conv.py 文件夹下。
_ConvNd 父类
# _ConvNd 父类
class _ConvNd(in_channels, out_channels, kernel_size, stride, padding,
dilation,transposed, out_channels, output_padding,
groups, bias, padding_mode)
- stride: controls the stride for the cross-correlation.
- padding: controls the amount of implicit zero-paddings on both sides for d i l a t i o n ∗ − ( k e r n e l s i z e − 1 ) − p a d d i n g dilation * - (kernel_size - 1) - padding dilation∗−(kernelsize−1)−padding number of points. See note below for details.
- output_padding: controls the additional size added to one side of the output shape. See note below for details.
- dilation: controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this link has a nice visualization of what dilation does.
- groups: controls the connections between inputs and outputs. in_channels and out_channels must both be divisible by groups.
卷积操作
# nn.Conv2d 卷积
class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, groups=1, bias=True)
转置卷积操作
# nn.ConvTranspose2d 反卷积
torch.nn.ConvTranspose2d(
in_channels : int,
out_channels : int,
kernel_size : Union[T, Tuple[T, T]],
stride : Union[T, Tuple[T, T]] = 1,
padding : Union[T, Tuple[T, T]] = 0,
output_padding : Union[T, Tuple[T, T]] = 0,
groups : int = 1,
bias : bool = True,
dilation : int = 1,
padding_mode : str = 'zeros')