1. 了解有几种attention mechanism
- Seq2Seq, from a sequence input to a sequence output.
- Align & Translate, A potential problem of the vanilla Seq2Seq architecture is that some information might not be captured by a fixed-length vector, i.e., the final hidden state from the encoder (ht). This can be especially problematic when processing long sentences where RNN is unable to send adequate information to the end of the sentences due to gradient exploding, etc.
- utilize a context vector to align the source and target inputs. The context vector preserves information from all hidden states from encoder cells and aligns them with the current target output.
- the hidden state at time t, ht; context vector c_i. annotations (h1,..., ht);
- Visual attention, align the input image and output word, tackling the image captioning problem.
- Hierarchical attention, 有层次的attention,sentence attention + word attention.
- Transfer and BERT,
2. 它们的原理分别是什么?
3. 哪种attention可以用在paper上.
self-attention, 以及有没有什么hierarchical attention可以用的。
4. 优势是什么?
Supplement knowledge:
- 双向RNN,因为时刻t的输出不仅取决于之前时刻的信息,还取决于未来的时刻,所以有了双向RNN。比如要预测一句话中间丢失的一个单词,有时只看上文是不行的,需要查看上下文。双向RNN很简单,就是两个互相叠加的RNN。
References: