论文阅读：Learning Visual Question Answering by Bootstrapping Hard Attention

2022-10-13 19:01:53

Learning Visual Question Answering by Bootstrapping Hard Attention

Google DeepMind ECCV-2018

2018-08-05 19:24:44

Paper：https://arxiv.org/abs/1808.00300

Introduction：

　　本文尝试仅仅用 hard attention 的方法来抠出最有用的 feature，进行 VQA 任务的学习。

Soft Attention：

　　Existing attention models are predominantly based on soft attention, in which all information is adaptively re-weighted before being aggregated. This can improve accuracy by isolating important information and avoiding interference from unimportant information.

Hard Attention：

　　It has the potential to improve accuracy and learning efficiency by focusing computation on the important parts of an image. But beyond this, it offers better computational efficiency because it only fully processes the information deemed most relevant.

　　但是，hard attention 有一个很致命的缺陷：由于图像中信息的选择是离散的，这导致基于梯度的学习方法，如 deep learning based methods，不可求导。然后，就无法利用 back-propagation 的方法进行区域的选择，来支持基于梯度的优化（because the choice of which information to process is discrete and thus non-differentiable, gradients cannot be backpropagated into the selection mechanism to support gradient-based optimization.）。当然有一些基于 Policy Gradient 的方法可以通过采样的方法，来处理梯度不可导的问题，但是这方面的研究，也仍然是非常的火热。

Approach Details：　　

待更新、、、

码农公寓

相关文章