论文阅读:Learning Visual Question Answering by Bootstrapping Hard Attention

Learning Visual Question Answering by Bootstrapping Hard Attention

Google DeepMind  ECCV-2018

  2018-08-05 19:24:44

Paper:https://arxiv.org/abs/1808.00300 

Introduction

  本文尝试仅仅用 hard attention 的方法来抠出最有用的 feature,进行 VQA 任务的学习。

Soft Attention

  Existing attention models are predominantly based on soft attention, in which all information is adaptively re-weighted before being aggregated. This can improve accuracy by isolating important information and avoiding interference from unimportant information.

Hard Attention

  It has the potential to improve accuracy and learning efficiency by focusing computation on the important parts of an image. But beyond this, it offers better computational efficiency because it only fully processes the information deemed most relevant.

  但是,hard attention 有一个很致命的缺陷:由于图像中信息的选择是离散的,这导致基于梯度的学习方法,如 deep learning based methods,不可求导。然后,就无法利用 back-propagation 的方法进行区域的选择,来支持基于梯度的优化(because the choice of which information to process is discrete and thus non-differentiable, gradients cannot be backpropagated into the selection mechanism to support gradient-based optimization.)。当然有一些基于 Policy Gradient 的方法可以通过采样的方法,来处理梯度不可导的问题,但是这方面的研究,也仍然是非常的火热。

论文阅读:Learning Visual Question Answering by Bootstrapping Hard Attention

Approach Details:  

待更新 、、、 

--

上一篇:201521123039《Java程序设计》 第六周学习总结


下一篇:【原创】贡献一个JS的弹出框代码...