文章题目:Embedding Human Knowledge in Deep Neural Network via Attention Map
论文地址:https://arxiv.org/abs/1905.03540?context=cs.CV
论文摘要:
论文解决的主要问题:将人类的知识引入传统非机器学习中取了很大的成功,but it is difficult to use it with deep learning due to the enormous number of model parameters.
提出的解决办法:we propose using the Attention Branch Network (ABN) which is a visual explanation model.
具体过程:ABN applies an attention map for visual explanantion to an attention mechanism.
第一步:
First,在人类知识的基础上,手动修改通过ABN获得的注意力图.
Then, 使用修改后的注意力图来建立注意机制,使ABN能够调整识别得分.
第二步:
为了在深度学习上采用HITL,在修改后的注意力图上提出一种微调的方法.
Our fine-tuning updates the attention and perception branches of the ABN by using the training loss calculated from the attention map output from the ABN along with the modified attention map.
微调的作用:enables the ABN to output an attention map corresponding to human knowledge.
此外,we use the update attention map with its embedded human knowledge as an attention mechanism and inference at the perception branch, which improves the performance of ABN.
实验结果:Experimental result with the ImageNet dataset, CUB-200-2010 dataset, and IDRiD demonstrate that our approach clarifies the attention map in terms of visual explanation and improves the classification performance. (我们的方法从视觉解释的角度澄清了注意力图,提高了分类性能)
**
1. Introduction
**
三种视觉解释方法:
类别激活投影(class activation mapping, CAM): outputs an attention map by using the response of the convolution layer
加权CAM: outputs an attention map by using the positive gradients of a specific category.
ABN: extends an attention map to an attention mechanism.
易产生的问题:a mismatch between the recognition result and an attention region may occur. (识别结果与注意力区域不匹配).
If the CNN pays attention to different objects than the GT, it is likely to perform incorrect classifications. This mismatch would be critical in some applications, such as medical image recognition systems where a mismatch between the classification result and attention region may degrade the credibility of the classification system.
为了解决上述问题,提出相应的解决办法:we use a human-in-the-loop (HITL) framework. 通过引入人类知识来解决复杂的图像识别任务.
面临的现状:but it is difficult to use them in deep learning that is used in various computer vision tasks, because the deep learning models have massive parameters.
本文的工作:通过HITL架构形式,将人类知识引入深度学习.
因此,将重点放在ABN结构上,也就是视觉解释和注意力机制.
**
## 文章的Contribution
**:
(1) We investigate the behavior of ABN when we modify the attention map with human knowledge. As a result of this investigation, we confirmed that the manual modification of attention maps with human knowledge can improve the classification performance.
(2) Our fine-tuning method of ABN with the modified attention map can obtain the optimal attention map for visual explanation and improve the performance. This is the first attempt to apply the HITL framework to deep learning in the computer vision field.
**
2. Related work
**
2.1 Human-in-the-loop on computer vision
HITL方法在小的机器学习方法中得到了广泛的应用,取得了很大的成功,但是在深度学习中引入HITL,目前还没有有效解决.
2.2 Visual explanation
分为两种方法:
gradient-based: outputs and attention map using gradients. 代表方法为:Grad-CAM
Response-based: outputs an attention map using the response of the convolutional layer. 代表方法为:CAM
ABN方法:ABN can adjust the recognition score because it applies an attention map to the attention mechanism.
2.3 Attention Branch Network
Attention mechanism定义:improves the performance by focusing on a specific feature location through weighting. (通过加权来关注特定的特征位置,以此来提高性能)
ABN主要由三部分构成:
(1) a feature extractor: extracts the feature map from an input image.
(2) an attention branch: calculates an attention map that represents the attention region of a CNN and the attention map is applied to feature map to the feature extractor by the attention mechanism.
(3) a perception branch: outputs the final class probability using the feature map and the attention map. The perception branch intensively trains the specific important region selected by the attention map. (感知器分支集中训练注意力图选择的特定重要区域.)
在训练阶段, ABN同时采用注意力和感知分支的训练损失来进行优化.
The attention map is generated by convoluting the K channel feature map with the 1x1 convolution layer.
ABN的优点:ABN can adjust the recognition result corresponding to a modified attention map for a visual explanation. 根据修改后的相关注意力图来调整识别结果.
利用ABN的优点,In this paper, focusing on this ABN ability, we propose a fine-tuning approach that introduces human knowledge based on HITL with deep learning.
The proposed method fine-tunes the brances of ABN by calculating the training loss between the output and modified attention maps.
Our fine-tuning approach enables ABN to improve the accuracy and interpretability for the visual explanation because ABN trains optimal attention maps with human knowledge.
**
3. Investigation of Modification of Attention Map
**
本节的工作内容:In this section, we investigate the behavior of ABN when we modify an attention map manually.
3.1 Modification of attention map
采用ImageNet数据集的验证样本.
(1)Specifically, we replace the attention map during inference with the modified attention map and then check the changes of the classification results.
(2)we use ResNet, which consists of 152 layers, along with the ABN (ResNet152 + ABN).
(3)ResNet152+ABN is trained with 1200k training samples from the ImageNet dataset
(4)we select the 1k mis-classified samples from the validation samples and modify these attention maps.
3.2 Accuracy on modified attention map
The top-1 and top-5 errors with the modified attention map are listed in Table 1.
**
4. Proposed method
**
从第三章的分析可以得出下面的结论:
(1)the attention map of ABN adjusts the recognition result by modifying an attention map.
(2)This result suggests that ABN can be applied to an HITL framework.
因此,提出下面的方法:
we propose fine-tuning the attention and perception branches of ABN by using the modified attention map.
为了证明,采用三种数据集:The ImageNet dataset, CUB200-2010 dataset, Indian Diabetic Retinopathy Image Dataset.
4.1 ABN with HITL
(1)an ABN model is trained using training images and labels, and then we collect the attention maps from the trained model.
Here, the attention maps are collected when ABN mis-classfiers a training sample.
(2)we modify the collected attention maps of ABN to recognize them correctly with human knowledge.
(3)the attention and perception branches of ABN are fine-tuned with the modified attention maps.
During the fine-tuning process, we update the branches by using the training los calculated from the outputted attention map and a modified attention map in addition to the loss of ABN.
4.2 Modification of attention maps
数据集
4.3 Fine-tuning of the branches
在嵌入人类知识之后,ABN is fine-tuned with these maps.
We add a loss Lmap to the conventional loss calculated in 公式1,如下:
在微调期间,提出的方法优化ABN的attention和percept分支,the feature extractor that extracts the feature map from an input image is not updated by the fine-tuning process.
两种训练策略:(1)仅训练attention branche;(2)同时训练attention and perception branches.
5. Experiment
5.1 Experimental details
**
6. Conclusion
**