【论文阅读】Cross-Sentence N-ary Relation Extraction with Graph LSTMs[ACL2017]

论文链接:https://www.aclweb.org/anthology/Q17-1008.pdf 

代码链接(theano):https://github.com/VioletPeng/GraphLSTM_release

In this paper, we explore a general relation extraction framework based on graph long short-term memory networks (graph LSTMs) that can be easily extended to cross-sentence n-ary relation extraction. The graph formulation provides a unified(统一的) way of exploring different LSTM approaches and incorporating various intra-sentential and inter-sentential dependencies, such as sequential, syntactic, and discourse relations(语篇关系).A robust contextual representation is learned for the entities, which serves as input to the relation classifier. This simplifies handling of relations with arbitrary arity(简化了任意参数数量的关系的处理), and enables multi-task learning with related relations.

1 Introduction

介绍了文档级关系抽取必要性。

In this paper, we explore a general framework for cross-sentence n-ary relation extraction, based on graph long short-term memory networks (graph LSTMs). By adopting the graph formulation, our framework subsumes prior approaches based on chain or tree LSTMs, and can incorporate a rich set of linguistic analyses to aid relation extraction. Relation classification takes as input the entity representations learned from the entire text关系分类以从整个文本中学习到的实体表示作为输入, and can be easily extended for arbitrary relation arity n. This approach also facilitates促进 joint learning with kindred relations where the supervision signal is more abundant.这种方法也有助于在监督信号更丰富的类似关系中进行联合学习。

Graph LSTMs that encode rich linguistic knowledge outperformed other neural network variants, as well as a well-engineered feature-based classifier.Multitask learning with sub-relations led to further improvement. Syntactic analysis conferred a significant benefit to the performance of graph LSTMs, especially when syntax accuracy was high.句法分析对图LSTMs的性能有很大的帮助,尤其是在句法精度较高的情况下。

【论文阅读】Cross-Sentence N-ary Relation Extraction with Graph LSTMs[ACL2017]

两个句子的依存树以及(EGFR,gefitinib,L858E:EGFR基因L858E突变的肿瘤对gefitinib治疗有反应)三元关系,ternary interaction :三元相互作用

2 Cross-sentence n-ary relation extraction

在标准的二元关系设置中,主要的方法通常是根据所讨论的两个实体之间的最短依赖路径来定义的,或者从路径中导出丰富的特征,或者使用深度神经网络对其进行建模。将这种范式推广到n元环境是很有挑战性的,因为有$C_n^2$路径。一个明显的解决方案是受戴维森语义学Davidsonian semantics的启发:首先,识别一个表示整个关系的触发器短语,然后将触发器和参数之间的n元关系简化为n个二进制关系。first, identify a single trigger phrase that signifies the whole relation, then reduce the n-ary relation to n binary relations between the trigger and an argument.通常很难指定一个触发器,因为关系是由几个词表示的,这些词通常不是连续的。此外,对训练示例进行注释是昂贵且耗时的,尤其是在需要触发器的情况下,这在以前的注释工作(如GENIA)中是很明显的。

此外,表示这种关系的词汇和句法模式将是稀疏的。为了处理这种稀疏性,传统的基于特征的方法需要大量的工程和大数据。不幸的是,当文本跨越多个句子时,这种挑战在跨句抽取中变得更加严峻。

3 Graph LSTMs

Graph LSTM以前从未应用于NLP任务,模型结构图:

【论文阅读】Cross-Sentence N-ary Relation Extraction with Graph LSTMs[ACL2017]

The input layer is the word embedding of input text.

Next is the graph LSTM which learns a contextual representation for each word. For the entities in question, their contextual representations are concatenated and become the input to the relation classifiers

For a multi-word entity, we simply used the average of its word representations and leave the exploration of more sophisticated aggregation approaches to future work.

At the core of the graph LSTM is a document graph that captures various dependencies among the input words. By choosing what dependencies to include in the document graph, graph LSTMs naturally subsumes linear-chain or tree LSTMs.

与传统的LSTMs相比,图的形式提出了新的挑战。由于图中存在潜在的循环,反向传播的直接实现可能需要多次迭代才能达到一个固定点。此外,在存在大量边类型(相邻词、句法依赖等adjacent-word, syntactic dependency, etc.)的情况下,参数化成为一个关键问题。

在本节的剩余部分中,我们首先介绍文档图,并展示如何在图LSTMs中进行反向传播。然后,我们讨论两种策略参数化的经常单位。最后,我们展示了如何在这个框架下进行多任务学习。

3.1 Document Graph

This document graph acts as the backbone主干 upon which a graph LSTM is constructed.If it con- tains only edges between adjacent words, we recover linear-chain LSTMs. Similarly, other prior LSTM approaches can be captured in this framework by restricting edges to those in the shortest dependency path or the parse tree.【如果只研究相邻单词之间的边,使用linear-chain LSTM模式,将边切换到the shortest dependency path or parse tree,捕获依赖关系】

3.2 Backpropagation in Graph LSTMs

作者考虑如果图中存在循环边,则梯度下降不好计算。在本文中,我们采用了一种简单的策略,在初步实验中取得了很好的效果,并对今后的工作做了进一步的探索。具体地说,我们将文档图划分为两个有向无环图(DAGs)。一个DAG包含从左到右的线性链以及其他前向指向依赖项。另一个DAG包含从右到左的线性链和向后指向的依赖关系。图3说明了这个策略。有效地,我们将原始图划分为向前的过程(从左到右),然后是向后的过程(从右到左),并相应地构造LSTMs。当文档图只包含线性链边linear chain edges时,graph LSTMs就是一个双向LSTMs(BiLSTMs)。

【论文阅读】Cross-Sentence N-ary Relation Extraction with Graph LSTMs[ACL2017]

图3:本文使用的图LSTMs。文档图(顶部)被划分为两个有向无环图(底部);图LSTMs由前向传递(从左到右)和后向传递(从右到左)构成。请注意,信息从依赖关系的子节点传递到父节点。

3.3 The Basic Recurrent Propagation Unit

在linear-chain LSTMs中,每个单元只包含一个遗忘门,因为它只有一个直接的precedent前继节点(即,相邻的单词边指向前一个单词)。然而,在graph LSTMs中,一个单元可能有几个precedent,包括通过不同的边连接到同一个单词。因此,我们为每个precedent前继节点引入一个遗忘门。

Encoding rich linguistic analysis introduces many distinct edge types besides word adjacency, such as syntactic dependencies, which opens up many possibilities for parametrization. This was not considered in prior syntax-aware LSTM approaches (Tai et al., 2015; Miwa and Bansal, 2016). In this paper, we explore two schemes that introduce more fined-grained parameters based on the edge types.编码丰富的语言分析除了引入单词邻接外,还引入了许多不同的边缘类型,例如句法依赖,这为参数化提供了许多可能性。以前的语法感知LSTM方法没有考虑到这一点(Tai等人,2015;Miwa和Bansal,2016)。在本文中,我们探讨了两种基于边缘类型引入更细粒度参数的方案。

Full Parametrization

 我们的第一个建议只是为每种边类型引入一组不同的参数,具体计算如下。$\bigodot$

$i_t=\sigma(W_ix_t+\sum_{j∈P(t)}U_{i}^{m(t,j)}h_{j}+b_i)$

$o_t=\sigma(W_ox_t+\sum_{j∈P(t)}U_{o}^{m(t,j)}h_{j}+b_o)$

$f_{tj}=\sigma(W_fx_t+U_f^{m(t,j)}h_{j}+b_f)$

$ \tilde{c_t}=tanh(W_cx_t+\sum_{j∈P(t)}U_{c}^{m(t,j)}h_{j}+b_c)$

$c_t=i_t\bigodot \tilde{c_t}+\sum_{j∈P(t)}f_{tj}\bigodot c_{j}$

$h_t=o_t\bigodot tanh(c_t)$

 

上一篇:关于linux中的目录配置标准以及文件基本信息


下一篇:从RNN到LSTM