车道线-论文阅读: Learning Lightweight Lane Detection CNNs by Self Attention Distillation

ICCV2019

code: https://github.com/cardwing/Codes-for-Lane-Detection
paper: https://arxiv.org/abs/1908.00821

Abstract

  1. present a novel knowledge distillation approach, i.e., Self Attention Distillation (SAD), which allows a model to learn from itself and gains substantial improvement without any additional supervision or labels.
  2. The value of attention map can be used “free”.
  3. The network: ENet-SAD

1.Introduction

 1. SAD: allows a network to exploit attention maps derived from its own layers as the distillation targets for its lower layers.
 2. SAD is only used in the training phase, so brings no computational cost during the deployment.
 3.By adding SAD , the preceding block to mimic the attention maps of a deeper block.

2.Method

1.aim to perform layer-wise and top-down attention distillation to enhance the representation learning process.

Only use activation-based attention distillation!

2.Activation-based attention distillation

AmRCm×Hm×WmA_{m} \in R^{C_{m} \times H_{m} \times W_{m}}Am​∈RCm​×Hm​×Wm​
AmA_{m}Am​表示activation map, CmC_{m}Cm​、HmH_{m}Hm​、WmW_{m}Wm​分别表示channel、height、width.
对channel进行操作:
g:RCm×Hm×WmRHm×Wmg: R^{C_{m} \times H_{m} \times W_{m}} \rightarrow R^{H_{m} \times W_{m}}g:RCm​×Hm​×Wm​→RHm​×Wm​
对于上述的操作,有三种实现方法:
1)绝对值求和:gsum(Am)=i=1CmAmig_{sum}(A_{m})=\sum_{i=1}^{Cm} |A_{mi}|gsum​(Am​)=∑i=1Cm​∣Ami​∣
2)绝对值的p次方求和:gsump(Am)=i=1CmAmipg_{sum}^{p}(A_{m})=\sum_{i=1}^{Cm} |A_{mi}|^{p}gsump​(Am​)=∑i=1Cm​∣Ami​∣p
3)绝对值最大值的p次方:gmaxp(Am)=maxi=1,CmAmipg_{max}^{p}(A_{m})=\max_{i=1,Cm} |A_{mi}|^{p}gmaxp​(Am​)=maxi=1,Cm​∣Ami​∣p
其中,AmiA_{mi}Ami​表示AmA_{m}Am​的第i层channel。
相比较而言,gsump(Am)g_{sum}^{p}(A_{m})gsump​(Am​)的效果会更好。

3.网络结构

车道线-论文阅读: Learning Lightweight Lane Detection CNNs by Self Attention Distillation
1) spatial softmax operation Φ()\Phi(\cdot)Φ(⋅) on gsum2(Am)g_{sum}^{2}(A_{m})gsum2​(Am​).
2)Bilinear upsampling B()B(\cdot)B(⋅) is added before the softmax operation if the size of original attention maps is different from that of targets.
3)AT-GEN is represented by a function Ψ=Φ(B(gsum2(Am)))\Psi=\Phi(B(g_{sum}^{2}(A_{m})))Ψ=Φ(B(gsum2​(Am​)))
车道线-论文阅读: Learning Lightweight Lane Detection CNNs by Self Attention Distillation
4)Total loss
车道线-论文阅读: Learning Lightweight Lane Detection CNNs by Self Attention Distillation

4.车道线后处理

1)网络输出:multi-channel prob maps + lane existence vector
2)后处理:
(1)用9*9的kernel平滑处理prob map;
(2)for each lane whose existence probability is larger than 0.5, we search the corresponding probability map every 20 rows for the position with the highest probability value.
(3)用cubic splines样条曲线拟合车道线。

5.其他

1) add a small network P1 to predict the existence of lanes.
2)Dilated Conv replace the original Conv
3)concats E3 E4 for output

6.训练及注意事项

1)SAD work well in the middle and high level layers;
2)在low-level增加SAD会损失网络性能;
3)mimicking the attention maps of the neighbouring layer successively brings more performance gains compared with mimicking those of nonadjacent layers (P23 + P34 outperforms P24 + P34).
4)在higher-layers and low-layers 之间进行distillation会降低指标,因为不同维度的layer info差别很大;
5)在训练后期加入SAD会 有效;在训练早起因为deeper layer还没被训练稳定,因此这些act map质量不高。

上一篇:callback


下一篇:Paper is all you need