Attention-over-Attention Neural Networks for Reading Comprehension

 

 

Contextual Embedding

Attention-over-Attention Neural Networks for Reading Comprehension

 

Pair-wise Matching Score

需要计算一个match matrix,这个矩阵描述document和query中每个word相互之间的match程度,也可以理解为相似度。该文的M(i, j)函数计算比较简单,是一个dot production

Attention-over-Attention Neural Networks for Reading Comprehension

 

Individual Attentions

Attention-over-Attention Neural Networks for Reading Comprehension

Attention-over-Attention Neural Networks for Reading Comprehension

注意:

column-wise softmax attention:

计算的是document中的词对于query的importance,即document中word的importance distribution

这里表述的不是很清楚,注意力向量的下标t这里说的是time,差不多可以理解为是把query中的每个词当成一个time(时间步),然后Attention-over-Attention Neural Networks for Reading Comprehension就表示针对query中的Attention-over-Attention Neural Networks for Reading Comprehension来说,ducument中每一个词对于它的importance。在以前的文章中,对于所有query word的Attention-over-Attention Neural Networks for Reading Comprehension执行的合并方法是偏直觉性的(求和或者平均来降维),而该文中一个比较重要的点也就是对于这个地方的优化:

 

important:

因为这些Attention-over-Attention Neural Networks for Reading Comprehension的重要性又各有不同:比如query中的entity word生成的Attention-over-Attention Neural Networks for Reading Comprehension肯定要比一个adj、prop这样没有什么实体意义的词来说是更重要的,而之前的paper相加或者平均的操作是把它们都看成一样重要明显是不合理的,所以这里就引入了嵌套attention,即对于每一个query word的attention vector计算它们各自的attention weight,其实简单的来理解也就是计算query中每一个词本身的重要度

 

column-wise指的是每次针对一列进行运算,这个得记住

 

Attention-over-Attention

row-wise softmax attention:

计算的是query中word的importance distribution

 

 

计算row-wise softmax attention:

Attention-over-Attention Neural Networks for Reading Comprehension

将每个query word-level的attention vector β做自身的平均来得到当前word的importance(这里就已经降维了)

Attention-over-Attention Neural Networks for Reading Comprehension

Attention-over-Attention Neural Networks for Reading Comprehension

 

 

 

最后将Attention-over-Attention Neural Networks for Reading ComprehensionAttention-over-Attention Neural Networks for Reading Comprehension进行点乘就得到Attention-over-Attention Neural Networks for Reading Comprehension,而这也是文中提到的"attended document-level attention"

 

Attention-over-Attention Neural Networks for Reading Comprehension

Final Predictions

到上面就已经为document中的词都计算了一个s,但因为词在document是有重复的,应该对同一个word的所有occurance做一个结合,该文和以前的文章一样,也是用的相加的方式,下图是该文引用的文章关于这一部分的图片说明

Attention-over-Attention Neural Networks for Reading Comprehension

best Re-ranking Strategy

在最后,保存了一个n-ranking的candidate list来存储候选的word,作者使用了以下的方法来作为评估的metric

 

这边有点没懂

 

Global N-gram LM:

This is a fundamental metric in scoring sentence, which aims to evaluate its fluency. This model is trained on

the document part of training data

Local N-gram LM:

Different from global LM, the local LM aims to explore the information with the given document, so the statistics are obtained from the test-time document. It should be noted that the local LM is trained sample-by-sample, it is not trained on the entire test set, which is not legal in the real test case. This model is useful when there are many unknown words in the test sample

Word-class LM:

Similar to global LM, the word-class LM is also trained on the document part of training data, but the words are converted to its word class ID. The word class can be obtained by using clustering methods. In this paper, we simply utilized the mkcls tool for generating 1000 word classes

 

Attention-over-Attention Neural Networks for Reading Comprehension

上一篇:curl使用:HTTP请求、下载文件、FTP上传下载


下一篇:Swagger接口如何生成Html离线文档