【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

ICML-2019


【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》


文章目录


1 Background and Motivation

Scaling up ConvNets is widely used to achieve better accuracy.

常见的如

1)scale up 网络深度(比如 resnet50 to resnet 101),

2)scale up 网络的宽度(resnet50 to wide-resnet

还有不常见的如

3)scale up 输入的分辨率

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》
三者之间是否有内在联系?三者联调如何实现最大精度的提升呢?

本文作者 first to empirically quantify the relationship among all three dimensions of network width, depth, and resolution,以高效的提升模型精度

2 Related Work

  • ConvNet Accuracy
  • ConvNet Efficiency——lightweight network
  • Model Scaling——width, depth, and resolutions

3 Advantages / Contributions

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

效仿 MNASNet AutoML 出 EfficientNet-B0,从 width, depth, and resolutions 三个维度 compound scale up EfficientNet-B0 形成不同大小的 EfficientNet-Bx,在 ImageNet 上实现 SOTA 且网络参数很少,跨数据集验证泛化性能也很棒(5/8 SOTA)

4 Compound Model Scaling

核心:
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

4.1 Problem Formulation

神经网络 N N N 可以由堆叠的层 F ( X ) F(X) F(X) 来表示

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

  • X 1 X_1 X1​ 是 input tensor
  • F j F_j Fj​ 是 operator(eg conv 和 activation),其中 j j j 表示 layer j j j

更模块化一点可以表示为

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

  • F i L i F_i^{L_i} FiLi​​ 表示 layer F i F_i Fi​ 在 stage i i i 中重复了 L i L_i Li​ 次

网络迭代以提升精度的过程可表示为

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

4.2 Scaling Dimensions

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》
Scaling up any dimension of network width, depth, or resolution improves accuracy, but the accuracy gain diminishes for bigger models.

1)scaling Depth

优势:capture richer and more complex features

缺点:more difficult to train due to the vanishing gradient problem——diminishing accuracy return for very deep ConvNets(一定深度后 ACC 会达到瓶颈)

2)scaling Width

通道数增加了

优势:wider networks tend to be able to capture more fine-grained features and are easier to train

缺点:have difficulties in capturing higher level features

3)scaling Resolution

优点:potentially capture more fine-grained patterns

4.3 Compound Scaling

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》
上图 width 固定,改变 depth 和 resolution 来观测结果,发现同时改 depth 和 resolution 效果最猛

In order to pursue better accuracy and efficiency, it is critical to balance all dimensions of network width, depth, and resolution during ConvNet scaling

基于4.2 和 4.3 小节红色字体的分析,作者提出了如下的 compound scaling 方法

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

  • α \alpha α、 β \beta β、 γ \gamma γ 是通过 small grid search 来获取的

  • ϕ \phi ϕ 是 a user-specified coefficient that controls how many more resources are available for model scaling

为啥约束 α \alpha α 时是 α \alpha α,而约束 β \beta β、 γ \gamma γ 时是 β 2 \beta^2 β2、 γ 2 \gamma^2 γ2?

doubling network depth will double FLOPS, but doubling network width or resolution will increase FLOPS by four times

按照作者的 compound scaling 方式,网络的 FLOPS 变成了原来的

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

5 EfficientNet Architecture

基于 MNASNet 去 AutoML 基础网络 EfficientNet-B0——we optimize FLOPS rather than latency since we are not targeting any specific hardware device.

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》
MBConv 是 mobilenet V2 的 inverted bottleneck

step 1:固定 ϕ = 1 \phi = 1 ϕ=1 去搜最优的 α \alpha α, β \beta β, γ \gamma γ——we find the best values for EfficientNet-B0 are α = 1.2 \alpha = 1.2 α=1.2, β = 1.1 \beta = 1.1 β=1.1, γ = 1.15 \gamma = 1.15 γ=1.15

step 2:固定 α \alpha α, β \beta β, γ \gamma γ,增大 ϕ \phi ϕ 来增大网络(EfficientNet-B1~EfficientNet-B7)

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

6 Experiments

6.1 Datasets

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

  • ImageNet
  • CIFAR10
  • CIFAR100
  • Birdsnap
  • Stanford Cars
  • Flowers
  • FGVC Aircraft
  • Oxford-IIIT Pets
  • Food-101

6.2 Experimental for ImageNet

1)Scaling Up MobileNets and ResNets
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》
compound scaling 还是比 single scaling 猛哒

2)ImageNet Results for EfficientNet
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》
下面看看速度
【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》

可以说是又快又猛

6.3 Transfer Learning Results for EfficientNet

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》
又快又猛

这个图画成不同网络用 compound scaling(点变成线) 就更惊艳啦

【EfficientNet】《EfficientNet:Rethinking Model Scaling for Convolutional Neural Networks》
5 / 8 SOTA 强强强

7 Conclusion(own)

  • width / depth / resolution 单独调的优缺点以及对网络 FLOPS 影响的差异

  • width / depth / resolution 组合调更猛,初始的缩放因子 α \alpha α、 β \beta β、 γ \gamma γ 得 grid search 下

  • bigger models need more regularization(eg:越大 dropout 系数越高,当然指数据规模不变的情况下)

上一篇:Retraining an Image Classifier | TF Hub


下一篇:EfficientNet图示