DL之SegNet:SegNet图像分割算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略(一)

SegNet图像分割算法的简介(论文介绍)


更新……


Abstract

      We present a novel and practical deep fully convolutional neural network architecture for semantic pixel-wise segmentation  termed SegNet. This core trainable segmentation engine consists of an encoder network, a corresponding decoder network followed  by a pixel-wise classification layer. The architecture of the encoder network is topologically identical to the 13 convolutional layers in the  VGG16 network [1]. The role of the decoder network is to map the low resolution encoder feature maps to full input resolution feature  maps for pixel-wise classification. The novelty of SegNet lies is in the manner in which the decoder upsamples its lower resolution input  feature map(s). Specifically, the decoder uses pooling indices computed in the max-pooling step of the corresponding encoder to  perform non-linear upsampling. This eliminates the need for learning to upsample. The upsampled maps are sparse and are then  convolved with trainable filters to produce dense feature maps. We compare our proposed architecture with the widely adopted FCN [2]  and also with the well known DeepLab-LargeFOV [3], DeconvNet [4] architectures. This comparison reveals the memory versus  accuracy trade-off involved in achieving good segmentation performance.  SegNet was primarily motivated by scene understanding applications. Hence, it is designed to be efficient both in terms of memory and  computational time during inference. It is also significantly smaller in the number of trainable parameters than other competing  architectures and can be trained end-to-end using stochastic gradient descent. We also performed a controlled benchmark of SegNet  and other architectures on both road scenes and SUN RGB-D indoor scene segmentation tasks. These quantitative assessments  show that SegNet provides good performance with competitive inference time and most efficient inference memory-wise as compared  to other architectures. We also provide a Caffe implementation of SegNet and a web demo at

http://mi.eng.cam.ac.uk/projects/segnet/.

      本文提出了一种新颖实用的深度全卷积神经网络结构——SegNet。该核心的可训练分割引擎由编码器网络、相应的解码器网络和像素级分类层组成。编码器网络的结构在拓扑上与VGG16网络[1]中的13个卷积层相同。解码器网络的作用是将编码器的低分辨率特征映射为全输入分辨率特征映射,进行像素级分类。SegNet lies的新颖之处在于解码器向上采样其低分辨率输入特征图的方式。具体地说,解码器使用在相应编码器的最大池化步骤中计算的池化索引来执行非线性上采样。这消除了学习向上采样的需要。上采样后的图像是稀疏的,然后与可训练滤波器进行卷积,生成密集的特征图。我们将我们提出的体系结构与广泛采用的FCN[2]以及著名的DeepLab-LargeFOV[3]、DeconvNet[4]体系结构进行了比较。这个比较揭示了在获得良好的分割性能时所涉及的内存和精度之间的权衡。SegNet主要是由场景理解应用程序驱动的。因此,它的设计在内存和推理过程中的计算时间方面都是高效的。它在可训练参数的数量上也明显小于其他竞争架构,并且可以使用随机梯度下降进行端到端训练。我们还在道路场景和SUN RGB-D室内场景分割任务上对SegNet等架构进行了受控基准测试。这些定量评估表明,与其他体系结构相比,SegNet具有良好的性能,推理时间有竞争力,并且在内存方面推理效率最高。我们还提供了一个Caffe实现SegNet和一个web demo at

http://mi.eng.cam.ac.uk/projects/segnet/

CONCLUSION  

      We presented SegNet, a deep convolutional network architecture  for semantic segmentation. The main motivation behind SegNet  was the need to design an efficient architecture for road and indoor  scene understanding which is efficient both in terms of memory  and computational time. We analysed SegNet and compared it  with other important variants to reveal the practical trade-offs  involved in designing architectures for segmentation, particularly  training time, memory versus accuracy. Those architectures which store the encoder network feature maps in full perform best but  consume more memory during inference time. SegNet on the  other hand is more efficient since it only stores the max-pooling  indices of the feature maps and uses them in its decoder network  to achieve good performance. On large and well known datasets  SegNet performs competitively, achieving high scores for road  scene understanding. End-to-end learning of deep segmentation  architectures is a harder challenge and we hope to see more  attention paid to this important problem.

      本文提出了一种用于语义分割的深度卷积网络结构SegNet。SegNet背后的主要动机是需要为道路和室内场景理解设计一个高效的架构,它在内存和计算时间方面都是高效的。我们分析了SegNet,并将其与其他重要的变体进行了比较,以揭示在设计用于分割的架构时所涉及的实际权衡,尤其是训练时间、内存和准确性。那些完全存储编码器网络特征映射的架构执行得最好,但在推理期间消耗更多内存。另一方面,SegNet更高效,因为它只存储特征映射的最大池索引,并将其用于解码器网络中,以获得良好的性能。在大型和知名的数据集上,SegNet表现得很有竞争力,在道路场景理解方面获得了高分。深度分割体系结构的端到端学习是一个比较困难的挑战,我们希望看到更多的人关注这个重要的问题。


论文

Vijay Badrinarayanan, Alex Kendall, Roberto Cipolla.

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation,

IEEE Transactions on Pattern Analysis and Machine Intelligence ( Volume: 39 , Issue: 12 , Dec. 1 2017 )

https://arxiv.org/abs/1511.00561


《SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation》

arXiv地址:https://arxiv.org/abs/1511.00561?context=cs

PDF地址:https://arxiv.org/pdf/1511.00561.pdf

Vijay Badrinarayanan, Kendall, and Roberto Cipolla(2015): SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. arXiv preprint arXiv:1511.00561 (2015).






上一篇:DL之SegNet:SegNet图像分割算法的简介(论文介绍)、架构详解、案例应用等配图集合之详细攻略(二)


下一篇:算法设计手冊(第2版)读书笔记, Springer - The Algorithm Design Manual, 2ed Steven S.Skiena 2008