总结:和ResNet相同的参数和计算量的同时,可以取得更好的表现。相同表现时,复杂度最低。
a 101-layer ResNeXt is able to achieve better accuracy than ResNet-200 [15] but has only 50% complexity.
1.
mmdetection中ResNext的实现
配置文件中:
model = dict(
type='RetinaNet',
pretrained=None,
backbone=dict(
type='ResNeXt',
norm_cfg=dict(type='SyncBN'),
depth=101,
groups=64,
base_width=4,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
style='pytorch'),
各个参数的含义:
depth (int): Depth of resnet, from {18, 34, 50, 101, 152}.
num_stages (int): Resnet stages, normally 4.
groups (int): Group of resnext.
base_width (int): Base width of resnext.
strides (Sequence[int]): Strides of the first block of each stage.
dilations (Sequence[int]): Dilation of each stage.
out_indices (Sequence[int]): Output from which stages.
style (str): `pytorch` or `caffe`. If set to "pytorch", the stride-two
layer is the 3x3 conv layer, otherwise the stride-two layer is
the first 1x1 conv layer.
frozen_stages (int): Stages to be frozen (all param fixed). -1 means
not freezing any parameters.
norm_cfg (dict): dictionary to construct and config norm layer.
norm_eval (bool): Whether to set norm layers to eval mode, namely,
freeze running stats (mean and var). Note: Effect on Batch Norm
and its variants only.
with_cp (bool): Use checkpoint or not. Using checkpoint will save some
memory while slowing down the training speed.
zero_init_residual (bool): whether to use zero init for last norm layer
in resblocks to let them behave as identity.
参考:https://www.jianshu.com/p/7478ce41e46b
一般增强一个CNN的表达能力有三种手段:一是增加网络层次即加深网络(目前CNN已经由最初Alexnet的不到十层增加到了成百上千层,而实际实验结果表明由层次提升而带来的边际准确率增加已是越来越少);二是增加网络模块宽度(可见我们之前有介绍过的Wide residual network,可宽度的增加必然会带来指数级的参数规模提升,因此它并非为主流CNN设计所认可。);
三是改善CNN网络结构设计(当然在不增加模型复杂度的情况下通过改良模型设计以来提升模型性能是最理想的做法,不过其门槛则实在是太高,不然Google/Facebook/Microsoft的那些埋头设计网络/调参的哥们儿就没办法拿那么高工资了。:))。
ResNeXt的做法可归为上面三种方法的第三种。它引入了新的用于构建CNN网络的模块,而此模块又非像过去看到的Inception module那么复杂,它更是提出了一个cardinatity的概念,用于作为模型复杂度的另外一个度量。Cardinatity指的是一个block中所具有的相同分支的数目。
1.abstract
我们提出了一个简单的,高度模块化的分类网络。该网络有一个新的维度:cardinality(基数)。在复杂度不增加的情况下,该网络能提高分类精度。
1.introduction
ction中ResNext的实现
配置文件中:
model = dict(
type='RetinaNet',
pretrained=None,
backbone=dict(
type='ResNeXt',
norm_cfg=dict(type='SyncBN'),
depth=101,
groups=64,
base_width=4,
num_stages=4,
out_indices=(0, 1, 2, 3),
frozen_stages=1,
style='pytorch'),
各个参数的含义:
depth (int): Depth of resnet, from {18, 34, 50, 101, 152}.
num_stages (int): Resnet stages, normally 4.
groups (int): Group of resnext.
base_width (int): Base width of resnext.
strides (Sequence[int]): Strides of the first block of each stage.
dilations (Sequence[int]): Dilation of each stage.
out_indices (Sequence[int]): Output from which stages.
style (str): `pytorch` or `caffe`. If set to "pytorch", the stride-two
layer is the 3x3 conv layer, otherwise the stride-two layer is
the first 1x1 conv layer.
frozen_stages (int): Stages to be frozen (all param fixed). -1 means
not freezing any parameters.
norm_cfg (dict): dictionary to construct and config norm layer.
norm_eval (bool): Whether to set norm layers to eval mode, namely,
freeze running stats (mean and var). Note: Effect on Batch Norm
and its variants only.
with_cp (bool): Use checkpoint or not. Using checkpoint will save some
memory while slowing down the training speed.
zero_init_residual (bool): whether to use zero init for last norm layer
in resblocks to let them behave as identity.

参考:https://www.jianshu.com/p/7478ce41e46b
一般增强一个CNN的表达能力有三种手段:一是增加网络层次即加深网络(目前CNN已经由最初Alexnet的不到十层增加到了成百上千层,而实际实验结果表明由层次提升而带来的边际准确率增加已是越来越少);二是增加网络模块宽度(可见我们之前有介绍过的Wide residual network,可宽度的增加必然会带来指数级的参数规模提升,因此它并非为主流CNN设计所认可。);
三是改善CNN网络结构设计(当然在不增加模型复杂度的情况下通过改良模型设计以来提升模型性能是最理想的做法,不过其门槛则实在是太高,不然Google/Facebook/Microsoft的那些埋头设计网络/调参的哥们儿就没办法拿那么高工资了。:))。
ResNeXt的做法可归为上面三种方法的第三种。它引入了新的用于构建CNN网络的模块,而此模块又非像过去看到的Inception module那么复杂,它更是提出了一个cardinatity的概念,用于作为模型复杂度的另外一个度量。Cardinatity指的是一个block中所具有的相同分支的数目。
1.abstract
我们提出了一个简单的,高度模块化的分类网络。该网络有一个新的维度:cardinality(基数)。在复杂度不增加的情况下,该网络能提高分类精度。
1.introduction
VGG_nets: stack building blocks of the same shape
Inception models: split-transfer-merge,用1*1 conv来split,用3*3,5*5来transfer,用concatention来merge. Inception用较小的计算量获得了较强的表示能力. 尽管inception确实效果好,但是面对新的数据集时需要重新设计很多的超参数(Although careful combinations of these components yield excellent neural network recipes, it is in general unclear how to adapt the Inception architectures to new datasets/tasks, especially when there are many factors and hyper-parameters to be designed)。
因此,我们设计了下面的网络结构,且我们设计的网络是在没有增加参数和计算量的情况下
提高网络的能力的。图1左右的参数和计算量相同:
Figure 1 (a)(b)的参数和计算量差不多:
并且给出了该网络的其他两个等价形式(Figure 3 (b),(c))
Figure 3(c)通过分组卷积(group)自动实现了split.
增加cardinality对提高网络的表现能力更好,尤其是当前深度和宽度已经瓶颈时。
效果:复杂度降低一半,但是结果基本不变
a 101-layer ResNeXt is able to achieve better accuracy than ResNet-200 [15] but has only 50% complexity.
3.Method
文章从全连接的基本形式出发:
提出了aggregated transformations:
然后,按照split-transform-merge提出了Figure 3(a)的转换结构:
然后,根据group convolution的性质:能自动split,split后的分开做transform。且我们的各个branch的结构简单而且一致,所以我们使用group convolution来实现上面的结构。
使用分组卷积搭建整个ResNeXt的结构:
4.Implementation details
(1) perform batch normalization (BN) [17] right after the convolutions
5. Experiments
ResNeXt-50 (32×4d): Table 1 shows a ResNeXt-50 constructed by a template with cardinality = 32 and bottleneck width = 4d (Fig. 3). This network is denoted as ResNeXt-50 (32×4d) for simplicity.
实验结果:
(1)32*4d的效果最好
大梦初醒123 发布了69 篇原创文章 · 获赞 7 · 访问量 1万+ 私信 关注