1. 论文思想
- factorized convolutions and aggressive regularization.
- 本文给出了一些网络设计的技巧。
2. 结果
- 用5G的计算量和25M的参数。With an ensemble of 4 models and multi-crop evaluation, we report 3.5% top-5 error and 17.3% top-1 error.
3. Introduction
- scaling up convolution network in efficient ways.
4. General Design Principles
Avoid representational bottlenecks, especially early in the network.(简单说就是feature map的大小要慢慢的减小。)
Higher dimensional representations are easier to process locally within a network. Increasing the activations per tile in a convolutional network allows for more disentangled features. The resulting networks will train faster.(在网络较深层应该利用更多的feature map,有利于容纳更多的分解特征。这样可以加速训练)
Spatial aggregation can be done over lower dimensional embeddings without much or any loss in representational power.(也就是bottleneck layer的设计)
Balance the width and depth of the network.(Increasing both the width and the depth of the network can contribute to higher quality networks.同时增加网络的深度和宽度)
5. Factorizing Convolution With Large Filter Size
- 分解较大filter size的卷积。
5.1. Factorization into smaller convolutions
- 一个5x5的卷积可以分解为两个3x3的卷积。
- 实验表明,将一个卷积分解为两个卷积的时候,在第一个卷积之后利用ReLU会提升准确率。也就是说线性分解性能会差一些。
5.2 Spatial Factorization into Asymmetric Convolutions
- 将3x3的卷积分解成31和13的卷积,可以减少33%计算量,如果将3x3分解为两个2x2,可以减少11%计算量,而且利用非对称卷积的效果还更好。
- 实践表明,不要过早的使用这种分解操作,在feature map 大小为(12 ~ 20)之间,使用它,效果是比较好的。
6. Utility of Auxiliary Classifier
7. Efficient Grid Size Reduction
- 左边引入了 representational bottleneck,右边的会增加大量的计算量,最佳的做法就是减少feature map大小的同时增大channel的数目。
- 以上才是正确的方式。