- 1 Background and Motivation
- 2 Related Work
- 3 Advantages / Contributions
- 4 Compound Model Scaling
- 5 EfficientNet Architecture
- 6 Experiments
- 7 Conclusion(own)
1 Background and Motivation
Scaling up ConvNets is widely used to achieve better accuracy.
1)scale up 网络深度(比如 resnet50 to resnet 101),
2)scale up 网络的宽度(resnet50 to wide-resnet)
3)scale up 输入的分辨率
本文作者 first to empirically quantify the relationship among all three dimensions of network width, depth, and resolution,以高效的提升模型精度
2 Related Work
- ConvNet Accuracy
- ConvNet Efficiency——lightweight network
- Model Scaling——width, depth, and resolutions
3 Advantages / Contributions
效仿 MNASNet AutoML 出 EfficientNet-B0,从 width, depth, and resolutions 三个维度 compound scale up EfficientNet-B0 形成不同大小的 EfficientNet-Bx,在 ImageNet 上实现 SOTA 且网络参数很少,跨数据集验证泛化性能也很棒(5/8 SOTA)
4 Compound Model Scaling
4.1 Problem Formulation
神经网络 N N N 可以由堆叠的层 F ( X ) F(X) F(X) 来表示
- X 1 X_1 X1 是 input tensor
- F j F_j Fj 是 operator(eg conv 和 activation),其中 j j j 表示 layer j j j
- F i L i F_i^{L_i} FiLi 表示 layer F i F_i Fi 在 stage i i i 中重复了 L i L_i Li 次
4.2 Scaling Dimensions
Scaling up any dimension of network width, depth, or resolution improves accuracy, but the accuracy gain diminishes for bigger models.
1)scaling Depth
优势:capture richer and more complex features
缺点:more difficult to train due to the vanishing gradient problem——diminishing accuracy return for very deep ConvNets(一定深度后 ACC 会达到瓶颈)
2)scaling Width
优势:wider networks tend to be able to capture more fine-grained features and are easier to train
缺点:have difficulties in capturing higher level features
3)scaling Resolution
优点:potentially capture more fine-grained patterns
4.3 Compound Scaling
上图 width 固定,改变 depth 和 resolution 来观测结果,发现同时改 depth 和 resolution 效果最猛
In order to pursue better accuracy and efficiency, it is critical to balance all dimensions of network width, depth, and resolution during ConvNet scaling
基于4.2 和 4.3 小节红色字体的分析,作者提出了如下的 compound scaling 方法
α \alpha α、 β \beta β、 γ \gamma γ 是通过 small grid search 来获取的
ϕ \phi ϕ 是 a user-specified coefficient that controls how many more resources are available for model scaling
为啥约束 α \alpha α 时是 α \alpha α,而约束 β \beta β、 γ \gamma γ 时是 β 2 \beta^2 β2、 γ 2 \gamma^2 γ2?
doubling network depth will double FLOPS, but doubling network width or resolution will increase FLOPS by four times
按照作者的 compound scaling 方式,网络的 FLOPS 变成了原来的
5 EfficientNet Architecture
基于 MNASNet 去 AutoML 基础网络 EfficientNet-B0——we optimize FLOPS rather than latency since we are not targeting any specific hardware device.
MBConv 是 mobilenet V2 的 inverted bottleneck
step 1:固定 ϕ = 1 \phi = 1 ϕ=1 去搜最优的 α \alpha α, β \beta β, γ \gamma γ——we find the best values for EfficientNet-B0 are α = 1.2 \alpha = 1.2 α=1.2, β = 1.1 \beta = 1.1 β=1.1, γ = 1.15 \gamma = 1.15 γ=1.15
step 2:固定 α \alpha α, β \beta β, γ \gamma γ,增大 ϕ \phi ϕ 来增大网络(EfficientNet-B1~EfficientNet-B7)
6 Experiments
6.1 Datasets
- ImageNet
- CIFAR100
- Birdsnap
- Stanford Cars
- Flowers
- FGVC Aircraft
- Oxford-IIIT Pets
- Food-101
6.2 Experimental for ImageNet
1)Scaling Up MobileNets and ResNets
compound scaling 还是比 single scaling 猛哒
2)ImageNet Results for EfficientNet
6.3 Transfer Learning Results for EfficientNet
这个图画成不同网络用 compound scaling(点变成线) 就更惊艳啦
5 / 8 SOTA 强强强
7 Conclusion(own)
width / depth / resolution 单独调的优缺点以及对网络 FLOPS 影响的差异
width / depth / resolution 组合调更猛,初始的缩放因子 α \alpha α、 β \beta β、 γ \gamma γ 得 grid search 下
bigger models need more regularization(eg:越大 dropout 系数越高,当然指数据规模不变的情况下)