CVPR-2018,Pytroch code
文章目录
1 Background and Motivation
作者发现 information propagation in state-of-the-art Mask R-CNN can be further improved
在 Mask R-CNN 基础上改进,进一步提升目标检测和实例分割的效果
2 Advantages / Contributions
提出 Path Aggregation Network(PANet) aiming at boosting information flow in proposal-based instance segmentation framework
- 1st place in the COCO 2017 Challenge Instance Segmentation task
- 2nd place in the COCO 2017 Challenge Object Detection task
- SOTA on MVD and Cityscapes
3 Method
三个改进模块
3.1 Bottom-up Path Augmentation
现有 FPN 结构的缺陷:
there is a long path from low-level structure to topmost features, increasing difficulty to access accurate localization information【图 1 (a)中红色虚箭头,前向传播时底层信息得经过整个 backbone 才能到达顶层,eg 到达 P5 层】
作者改进:
A bottom-up path is augmented to make low-layer information easier to propagate.【图 1 (a)中绿色虚箭头 】
细节如下:
Bottom-up Path 搭建方式是图 2 中的逆 FPN(自顶向下) 形式
注意 N 2 N_2 N2 is simply P 2 P_2 P2, without any processing
3.2 Adaptive Feature Pooling
缺陷:
熟悉 FPN 的小伙伴应该知道,proposals are assigned to different feature levels according to the size of proposals,像 “八爪鱼”,多条“腿”,一个 head,
two pro-posals with 10-pixel difference can be assigned to different levels,具体映射关系可以参考 Mask RCNN without Mask
information discarded in other levels may be helpful for final prediction
作者改进(每条腿上都接个头):
We use max operation to fuse features from different levels
把同一 proposal 所有 level 的信息融合起来,而不是根据 proposal 的大小来决定采用 FPN 哪层 level 的特征
下面这个图就可以很直观的感受到利用多 level feature 的必要
横坐标是原 FPN 的 level,折线是采用 Adaptive Feature Pooling 之后的 level
以蓝色的 level1 折线为例,采用 Adaptive Feature Pooling 之后发现,属于 level1 范围大小的 proposal 仅用了 ~30% 的 level 1 特征,其余特征为 ~30% level 2, ~20% level3, ~20% level4(原 FPN 属于 level1 范围大小的 proposal 采用 100% level 1 特征)
可以看到 Adaptive Feature Pooling 使每个 proposal 的特征更加丰富!
3.3 Fully-connected Fusion
缺陷:
Mask R-CNN 方法中,mask prediction is made on a single view(卷积),losing the chance to gather more diverse information
作者的改进:
A complementary branch capturing different views——引入了平行的 FC 分支,最后与 conv 分支融合来预测 mask
作者认为 FC 的优势在于
-
FC layers are location sensitive since predictions at different spatial locations are achieved by varying sets of parameters. So they have the ability to adapt to different spatial locations.
-
Also prediction at each spatial location is made with global information of the entire proposal.
4 Experiments
4.1 Datasets
- COCO
- Cityscapes
- MVD
4.2 Experiments on COCO
1)Instance Segmentation Results
2)Object Detection Results
3)Component Ablation Studies
A
P
AP
AP 是分割任务的结果,
A
P
b
b
AP^{bb}
APbb 是单独训练目标检测的结果,
A
P
b
b
M
AP^{bbM}
APbbM 是联合训练目标检测和分割的结果
tricks 的效果提升占了 50%
Half of the improvement is from multi-scale training and multi-GPU sync. BN
4)Ablation Studies on Adaptive Feature Pooling
5)Ablation Studies on Fully-connected Fusion
6)COCO 2017 Challenge
引入更多的 trick
1st,DCN 是 Deformable convolutional networks
2nd