MobileNetV2算法的简介(论文介绍)
作者在MobileNet基础上,又提出了改进的模型MobileNetV2,该模型可用于不同的任务,比如图像分类、目标检测、图像分割等。
Abstract
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes. We also describe efficient ways of applying these mobile models to object detection in a novel framework we call SSDLite. Additionally, we demonstrate how to build mobile semantic segmentation models through a reduced form of DeepLabv3 which we call Mobile DeepLabv3. is based on an inverted residual structure where the shortcut connections are between the thin bottleneck layers. The intermediate expansion layer uses lightweight depthwise convolutions to filter features as a source of non-linearity.
Additionally, we find that it is important to remove non-linearities in the narrow layers in order to maintain representational power. We demonstrate that this improves performance and provide an intuition that led to this design.
Finally, our approach allows decoupling of the input/output domains from the expressiveness of the transformation, which provides a convenient framework for further analysis. We measure our performance on ImageNet [1] classification, COCO object detection [2], VOC image segmentation [3]. We evaluate the trade-offs between accuracy, and number of operations measured by multiply-adds (MAdd), as well as actual latency, and the number of parameters.
摘要
在本文中,我们描述了一种新的移动架构mobilenet2,它改善了移动模型在多个任务和基准上以及在不同模型大小的范围内的最新性能。我们还描述了在一个称为SSDLite的新框架中,将这些移动模型应用于目标检测的有效方法。此外,我们还演示了如何通过DeepLabv3 (我们称之为Mobile DeepLabv3 )的简化形式来构建移动语义分割模型。是基于一个倒置的residual结构,其中的快捷连接是在薄的bottleneck 层之间。中间扩展层使用轻量级的垂直卷积来过滤作为非线性源的特征。
此外,我们发现为了保持具有代表性的能力,在狭窄的层面上去除非线性是很重要的。我们证明,这提高了性能,并提供了带来此设计的直觉(即灵感来源)。
最后,我们的方法允许将输入/输出域与转换的表现性分离,这为进一步分析提供了一个方便的框架。我们在ImageNet[1]分类、COCO目标检测[2]和VOC图像分割[3]上测量我们的性能。我们评估了精度、乘法加法(MAdd)度量的操作数、实际延迟和参数数之间的权衡。
Conclusions and future work
We described a very simple network architecture that allowed us to build a family of highly efficient mobile models. Our basic building unit, has several properties that make it particularly suitable for mobile applications. It allows very memory-efficient inference and relies utilize standard operations present in all neural frameworks.
For the ImageNet dataset, our architecture improves the state of the art for wide range of performance points.
For object detection task, our network outperforms state-of-art realtime detectors on COCO dataset both in terms of accuracy and model complexity. Notably, our architecture combined with the SSDLite detection module is 20× less computation and 10× less parameters than YOLOv2.
On the theoretical side: the proposed convolutional block has a unique property that allows to separate the network expressiviness (encoded by expansion layers) from its capacity (encoded by bottleneck inputs). Exploring this is an important direction for future research.
结论与未来工作
我们描述了一个非常简单的网络架构,它允许我们构建一系列高效的移动模型。我们的基本建筑单元有几个特性,使其特别适合移动应用。它允许非常有记忆效率的推理,并且依赖于使用所有神经框架中存在的标准操作。
对于ImageNet数据集,我们的架构改善了各种性能点的艺术状态。
对于目标检测任务,我们的网络在准确性和模型复杂性方面都优于COCO数据集上的最新实时检测器。值得注意的是,我们的架构与SSDLite 检测模块相结合,比YOLOv2的计算量少20倍,参数少10倍。
理论方面:所提出的卷积块具有独特的特性,允许将网络表现性(由扩展层编码)与其容量(由瓶颈输入编码)分开。探索这是今后研究的一个重要方向。
论文
Mark Sandler, Andrew Howard, MenglongZhu, Andrey Zhmoginov, Liang-ChiehChen.
MobileNetV2: Inverted Residuals and Linear Bottlenecks.
https://arxiv.org/abs/1801.04381v3
MobileNet V2算法的架构详解
1、MobileNet V1 → MobileNet V2
2、主要贡献是一个新颖的层模块
具有线性瓶颈的反转残差(inverted residual ):该模块将低维压缩表示作为输入,首先将其扩展为高维度并使用轻量级沿深度卷积(depthwiseconvolution)进行滤波。随后通过线性卷积将特征投射回低维表示。
中间层使用轻量级的沿深度卷积来对特征进行滤波作为非线性的来源。
3、实验结果
ImageNet上的分类结果的性能,不同网络的比较 Performance on ImageNet, comparison for different networks.
MobileNet V2算法的案例应用
更新……