SKNet: Selective Kernel Networks

论文 Selective Kernel Networks

  • We propose a dynamic selection mechanism in CNNs that allows each neuron to adaptively adjust its receptive field size based on multiple scales of input information.
  • However, some other RF properties of cortical neurons have not been emphasized in designing CNNs, and one such property is the adaptive changing of RF size.
  • All of these experiments suggest that the RF sizes of neurons are not ffixed but modulated by stimulus.
  • But that linear aggregation approach may be insufficient to provide neurons powerful adaptation ability. (InceptionNets)

1. Split

输入SKNet: Selective Kernel Networks​分别经过3×35×5group卷积分别得到SKNet: Selective Kernel Networks​和SKNet: Selective Kernel Networks​,论文中提到two branch情况下,5×5的组卷积使用dilation=23×3的膨胀卷积替代,下面的代码中还是用的普通5×5卷积

2. Fuse

        (1)通过 element-wise summation得到SKNet: Selective Kernel Networks

SKNet: Selective Kernel Networks

        (2)通过global average pooling embed global information得到SKNet: Selective Kernel Networks​,其中s的第c个元素的计算方式如下:

SKNet: Selective Kernel Networks

        (3)通过fully connected layer得到SKNet: Selective Kernel Networks​ 

SKNet: Selective Kernel Networks

                 其中SKNet: Selective Kernel Networks​是batch normlization,SKNet: Selective Kernel NetworksReLU,SKNet: Selective Kernel Networks​。注意这里通过redunction ratio SKNet: Selective Kernel Networks和阈值SKNet: Selective Kernel Networks​两个参数控制SKNet: Selective Kernel Networks​的输出通道SKNet: Selective Kernel Networks​,论文中SKNet: Selective Kernel Networks​默认为32。下面的实现代码中没有加BN ReLU

SKNet: Selective Kernel Networks

3. Select

SKNet: Selective Kernel Networks​  

"A soft attention across channels is used to adaptively select different spatial scales of information",其中SKNet: Selective Kernel Networks​,SKNet: Selective Kernel Networks​分别表示SKNet: Selective Kernel Networks​和SKNet: Selective Kernel Networks​的soft attention vector,SKNet: Selective Kernel Networks是A的第c行,SKNet: Selective Kernel Networks​是a的第c个元素,SKNet: Selective Kernel Networks​和SKNet: Selective Kernel Networks​同样。在两个分支的情况下,矩阵B是多余的,因为SKNet: Selective Kernel Networks​,最终结果特征图V通过下式得到

SKNet: Selective Kernel Networks

其中SKNet: Selective Kernel Networks​。




        c.SKNet: Selective Kernel Networks​和SKNet: Selective Kernel Networks​分别与softmax处理后的a,b相乘,再相加,得到最终输出V,V和原始输入X的维度保持一致。


        输入X分别经过3×35×5group卷积分别得到SKNet: Selective Kernel Networks​和SKNet: Selective Kernel Networks​,然后相加得到SKNet: Selective Kernel Networks​,因此SKNet: Selective Kernel Networks​中既包含了3×3感受野的信息又包含了5×5感受野的信息。然后通过使用全局平局池化编码全局信息生成channel-wise statistics,然后接一层全连接层进一步学习,然后接两个不同的全连接层分别得到SKNet: Selective Kernel Networks​和SKNet: Selective Kernel Networks​,SKNet: Selective Kernel Networks​和SKNet: Selective Kernel Networks​分别编码了3×3感受野和5×5感受野的信息,然后接softmax,这一步可以看作是attention机制,即让网络自己去学习不同视野的信息然后自适应的去融合不同感受野的信息。





import torch
import torch.nn as nn

class SKConv(nn.Module):
    def __init__(self, features, M, G, r, stride=1, L=32):
        """ Constructor
            features: input channel dimensionality.
            M: the number of branches.
            G: num of convolution groups.
            r: the radio for compute d, the length of z.
            stride: stride, default 1.
            L: the minimum dim of the vector z in paper, default 32.
        super(SKConv, self).__init__()
        d = max(int(features / r), L)
        self.M = M
        self.features = features
        self.convs = nn.ModuleList([])
        for i in range(M):
                nn.Conv2d(features, features, kernel_size=3 + i * 2, stride=stride, padding=1 + i, groups=G),
        self.fc = nn.Linear(features, d)
        self.fcs = nn.ModuleList([])
        for i in range(M):
                nn.Linear(d, features)
        self.softmax = nn.Softmax(dim=1)

    def forward(self, x):  # (batch_size, channel, height, width), channel==features
        # 1. Split
        for i, conv in enumerate(self.convs):
            fea = conv(x).unsqueeze_(dim=1)  # (b, 1, c, h, w)
            if i == 0:
                feas = fea
                feas =[feas, fea], dim=1)  # (b, 2, c, h, w)
        # 2. Fuse
        fea_U = torch.sum(feas, dim=1)  # (b, c, h, w) element-wise summation
        fea_s = fea_U.mean(-1).mean(-1)  # (b, c) global average pooling
        fea_z = self.fc(fea_s)  # (b, c/r)
        # 3. Select
        for i, fc in enumerate(self.fcs):
            vector = fc(fea_z).unsqueeze_(dim=1)  # (b, 1, c)
            if i == 0:
                attention_vectors = vector
                attention_vectors =[attention_vectors, vector], dim=1)  # (b, 2, c)
        attention_vectors = self.softmax(attention_vectors)  # (b, 2, c)
        attention_vectors = attention_vectors.unsqueeze(-1).unsqueeze(-1)  # (b, 2, c, 1, 1)
        fea_v = (feas * attention_vectors).sum(dim=1)  # (b, c, h, w)
        return fea_v


  1. 论文中提到5×5卷积用dilation=23×3膨胀卷积代替,但该实现中还是用的普通5×5卷积
  2. global avg pooling后面的fc层中,论文中有BNReLU,但该实现中没有


