【CS224n】(task3)Dependency Parsing

一、回顾依存句法分析

很多问题都可以转为分类问题,基于转移的依存句法分析器就由预测树结构问题转为预测动作序列问题。

有一种方法:
编码端:用来负责计算词的隐层向量表示
解码端:用来解码计算当前状态的所有动作得分

二、CS224n作业要求

Neural Transition-Based Dependency Parsing (44 points)
作业:基于神经网络,转移的依存解析器
目标:最大化UAS值(Unlabeled attachment score)

首先看清楚作业中的readme文件,确保有local_env.yml文件中所有的依赖项:

# 1. Activate your old environment:

    conda activate cs224n

# 2. Install docopt

    conda install docopt

# 3. Install pytorch, torchvision, and tqdm

    conda install pytorch torchvision -c pytorch
    conda install -c anaconda tqdm

如果想创建一个新虚拟环境,则:

# 1. Create an environment with dependencies specified in local_env.yml
#  (note that this can take some time depending on your laptop):
    
    conda env create -f local_env.yml

# 2. Activate the new environment:
    
    conda activate cs224n_a3

# To deactivate an active environment, use
    
    conda deactivate

依存句法解析器基于依存句法分析,处理句子结构,可以参考上一讲,依存句法解析器有基于转移的、基于图的、基于特征等等。本次作业是要求基于转移的依存句法解析器。

At every step it maintains a partial parse, which is represented as follows:

  • A stack of words that are currently being processed.
  • A buffer of words yet to be processed.
  • A list of dependencies predicted by the parser

初始,栈中值有root,依赖项列表(队列)为空,缓冲区按顺序包含句子中所有单词。每次操作,解析器进行一次转换,以此类推,直到缓冲区队列为空:
(1)shift:将缓冲区队列的元素入栈
(2)LEFT-ARC:将栈顶的两棵依存子树采用左弧合并;
(3)RIGHT-ARC:将栈顶的两棵依存子树采用右弧合并;
即每次操作,用依存句法解析器,作为一个分类器,求出三个动作的最大概率的那个,再进行对应动作的操作。
【CS224n】(task3)Dependency Parsing

图源自 车万翔《基于预训练模型的自然语言处理》

三、具体题目

(1)(4分)句子:I parsed this sentence correctly.
问:用了什么转换,添加了什么新依赖项(如果有的话),下面给了前三个步骤:
【CS224n】(task3)Dependency Parsing
answer:
【CS224n】(task3)Dependency Parsing
(2)(2分)一个含有n个单词的句子,一共有多少步解析,用1-2句话简要解释。
答:2n步。
因为句子中的每个单词在从堆栈中删除之前需要两次转换:SHIFT和一个arc。解析过程中的每一步只能为一个单词执行这两种转换中的一种。

(3)(6分)实现parser_transitions.pyPartialParse类中的构造函数__init__parse_step函数,即实现解析器的转换机制,你可以运行python parser_transitions.py part_c来测试。

1)首先是parser_transitions.pyPartialParse类中的构造函数__init__

class PartialParse(object):
    def __init__(self, sentence):
        """Initializes this partial parse.

        @param sentence (list of str): The sentence to be parsed as a list of words.
                                        Your code should not modify the sentence.
        """
        # The sentence being parsed is kept for bookkeeping purposes. Do NOT alter it in your code.
        self.sentence = sentence

        ### YOUR CODE HERE (3 Lines)
        ### Your code should initialize the following fields:
        ###     self.stack: The current stack represented as a list with the top of the stack as the
        ###                 last element of the list.
        ###     self.buffer: The current buffer represented as a list with the first item on the
        ###                  buffer as the first item of the list
        ###     self.dependencies: The list of dependencies produced so far. Represented as a list of
        ###             tuples where each tuple is of the form (head, dependent).
        ###             Order for this list doesn't matter.
        ###
        ### Note: The root token should be represented with the string "ROOT"
        ### Note: If you need to use the sentence object to initialize anything, make sure to not directly 
        ###       reference the sentence object.  That is, remember to NOT modify the sentence object. 

        self.stack = ['ROOT']
        self.buffer = sentence.copy() # shallow copy 浅拷贝
        self.dependencies = []

        ### END YOUR CODE

2)接着是parser_transitions.pyPartialParse类中的parse_step函数:

    def parse_step(self, transition):
        """Performs a single parse step by applying the given transition to this partial parse

        @param transition (str): A string that equals "S", "LA", or "RA" representing the shift,
                                left-arc, and right-arc transitions. You can assume the provided
                                transition is a legal transition.
        """
        ### YOUR CODE HERE (~7-12 Lines)
        ### TODO:
        ###     Implement a single parsing step, i.e. the logic for the following as
        ###     described in the pdf handout:
        ###         1. Shift
        ###         2. Left Arc
        ###         3. Right Arc

        if transition == 'S':
            self.stack.append(self.buffer.pop(0))
        elif transition == 'LA':
            self.dependencies.append((self.stack[-1], self.stack.pop(-2)))
        elif transition == 'RA':
            self.dependencies.append((self.stack[-2], self.stack.pop(-1)))
            
        ### END YOUR CODE

(4)(8分)当然可以每次分类器预测每次该执行的操作是哪个,但为了更高效,可以一次预测多次该执行的操作,即用下面的算法进行小批量解析:
【CS224n】(task3)Dependency Parsing
实现parser_transitions.py文件的minibatch_parse函数,可以用python parser_transitions.py part_d进行测试。


(5)(12分)模型训练。
模型提取表征当前状态的特征向量。我们将使用原始神经依存解析器论文(A Fast and Accurate Dependency Parser using Neural Networks)中提出的特征集。

utils/parser_utils.py中已经实现了获取这些特征的功能。这个特征向量由一系列标记组成(例如,堆栈中的最后一个单词,缓冲区中的第一个单词等)。它们能够被表示为 w = [ w 1 , w 2 , … , w m ] \mathbf{w}=\left[w_{1}, w_{2}, \ldots, w_{m}\right] w=[w1​,w2​,…,wm​],其中m是特征个数,并且 0 ≤ w i < ∣ V ∣ 0 \leq w_{i}<|V| 0≤wi​<∣V∣, ∣ V ∣ |V| ∣V∣是词汇表的size。然后我们的网络查找每个单词的嵌入,并将它们连接到一个输入向量: x = [ E w 1 , … , E w m ] ∈ R d m \mathbf{x}=\left[\mathbf{E}_{w_{1}}, \ldots, \mathbf{E}_{w_{m}}\right] \in \mathbb{R}^{d m} x=[Ew1​​,…,Ewm​​]∈Rdm
E ∈ R ∣ V ∣ × d \mathbf{E} \in \mathbb{R}^{|V| \times d} E∈R∣V∣×d是embedding矩阵,其中每个列向量 E w \mathbf{E}_{w} Ew​ 是单词 w w w的embedding。
网络:
h = ReLU ⁡ ( x W + b 1 ) l = h U + b 2 y ^ = softmax ⁡ ( l ) \begin{aligned} \mathbf{h} &=\operatorname{ReLU}\left(\mathbf{x} \mathbf{W}+\mathbf{b}_{1}\right) \\ \mathbf{l} &=\mathbf{h U}+\mathbf{b}_{2} \\ \hat{\mathbf{y}} &=\operatorname{softmax}(l) \end{aligned} hly^​​=ReLU(xW+b1​)=hU+b2​=softmax(l)​
最小化交叉熵: J ( θ ) = C E ( y , y ^ ) = − ∑ i = 1 3 y i log ⁡ y ^ i J(\theta)=C E(\mathbf{y}, \hat{\mathbf{y}})=-\sum_{i=1}^{3} y_{i} \log \hat{y}_{i} J(θ)=CE(y,y^​)=−i=1∑3​yi​logy^​i​用UAS作为模型的评价指标。
parser_model.py文件中能找到基架模型实现该网络,需要完成__init__embedding_lookupforward函数,然后完成run.py文件的train_for_epochtrain函数。最后执行python run.py训练模型,和计算在测试集(Penn树库,用Universal Dependencies标记的)的预测效果。

注意:
(1)在本次作业中,需要实现linear layer和embedding layer,所以不要直接使用torch.nn.Lineartorch.nn.Embedding

def minibatch_parse(sentences, model, batch_size):
    """Parses a list of sentences in minibatches using a model.

    @param sentences (list of list of str): A list of sentences to be parsed
                                            (each sentence is a list of words and each word is of type string)
    @param model (ParserModel): The model that makes parsing decisions. It is assumed to have a function
                                model.predict(partial_parses) that takes in a list of PartialParses as input and
                                returns a list of transitions predicted for each parse. That is, after calling
                                    transitions = model.predict(partial_parses)
                                transitions[i] will be the next transition to apply to partial_parses[i].
    @param batch_size (int): The number of PartialParses to include in each minibatch


    @return dependencies (list of dependency lists): A list where each element is the dependencies
                                                    list for a parsed sentence. Ordering should be the
                                                    same as in sentences (i.e., dependencies[i] should
                                                    contain the parse for sentences[i]).
    """
    dependencies = []

    ### YOUR CODE HERE (~8-10 Lines)
    ### TODO:
    ###     Implement the minibatch parse algorithm.  Note that the pseudocode for this algorithm is given in the pdf handout.
    ###
    ###     Note: A shallow copy (as denoted in the PDF) can be made with the "=" sign in python, e.g.
    ###                 unfinished_parses = partial_parses[:].
    ###             Here `unfinished_parses` is a shallow copy of `partial_parses`.
    ###             In Python, a shallow copied list like `unfinished_parses` does not contain new instances
    ###             of the object stored in `partial_parses`. Rather both lists refer to the same objects.
    ###             In our case, `partial_parses` contains a list of partial parses. `unfinished_parses`
    ###             contains references to the same objects. Thus, you should NOT use the `del` operator
    ###             to remove objects from the `unfinished_parses` list. This will free the underlying memory that
    ###             is being accessed by `partial_parses` and may cause your code to crash.

    partial_parses = [PartialParse(sentence) for sentence in sentences]
    unfinished_parses = partial_parses[:] # shallow copy
    while len(unfinished_parses) > 0:
        # 从unfinished parses中取出第一个batchsize的parses
        minibatch_partial_parses = unfinished_parses[:batch_size]
        
        # 模型预测minibatch中每个部分解析器的下一个转换步骤
        minibatch_transitions = model.predict(minibatch_partial_parses)
        
        # 根据预测结果,在minibatch中的各个局部解析,执行解析步骤
        for transition, partial_parse in zip(minibatch_transitions, minibatch_partial_parses):
            partial_parse.parse_step(transition)
        
        # 从未完成的解析中删除已完成的解析(空缓冲区和大小为1的堆栈)。
        unfinished_parses = [
            partial_parse for partial_parse in unfinished_parses 
            if not (len(partial_parse.buffer) == 0 and len(partial_parse.stack) == 1)
        ]

    for partial_parse in partial_parses:
        dependencies.append(partial_parse.dependencies)
    ### END YOUR CODE

    return dependencies

运行代码结果:
dev UAS: 88.60
test UAS: 89.08

(6)检验模型效果

四、实验过程和结果

这里的进度条是通过tqdm工具包实现的。

SyntaxWarning: "is" with a literal. Did you mean "=="?
  return [("RA" if pp.stack[1] is "right" else "LA") if len(pp.buffer) == 0 else "S"
================================================================================
INITIALIZING
================================================================================
Loading data...
took 2.58 seconds
Building parser...
took 1.66 seconds
Loading pretrained embeddings...
took 8.80 seconds
Vectorizing data...
took 1.95 seconds
Preprocessing training data...
took 65.94 seconds
took 0.18 seconds

================================================================================
TRAINING
================================================================================
Epoch 1 out of 10
100%|██████████| 1848/1848 [02:07<00:00, 14.54it/s]
Average Train Loss: 0.1781648173605725
Evaluating on dev set
1445850it [00:00, 28400918.10it/s]      
- dev UAS: 84.76
New best dev UAS! Saving model.

Epoch 2 out of 10
100%|██████████| 1848/1848 [02:21<00:00, 13.09it/s]
Average Train Loss: 0.11059159756480873
Evaluating on dev set
1445850it [00:00, 21319509.36it/s]      
- dev UAS: 86.57
New best dev UAS! Saving model.

Epoch 3 out of 10
100%|██████████| 1848/1848 [02:29<00:00, 12.35it/s]
Average Train Loss: 0.09602350440255297
Evaluating on dev set
1445850it [00:00, 21010828.57it/s]      
- dev UAS: 87.23
New best dev UAS! Saving model.

Epoch 4 out of 10
100%|██████████| 1848/1848 [02:24<00:00, 12.78it/s]
Average Train Loss: 0.08655059076765012
Evaluating on dev set
1445850it [00:00, 18410020.64it/s]      
- dev UAS: 88.03
New best dev UAS! Saving model.

Epoch 5 out of 10
100%|██████████| 1848/1848 [02:29<00:00, 12.32it/s]
Average Train Loss: 0.07943204295664251
Evaluating on dev set
1445850it [00:00, 24345468.35it/s]      
- dev UAS: 88.25
New best dev UAS! Saving model.

Epoch 6 out of 10
100%|██████████| 1848/1848 [02:28<00:00, 12.42it/s]
Average Train Loss: 0.07376304407124266
Evaluating on dev set
1445850it [00:00, 23765859.77it/s]      
- dev UAS: 88.06

Epoch 7 out of 10
100%|██████████| 1848/1848 [02:11<00:00, 14.08it/s]
Average Train Loss: 0.06907538355638583
Evaluating on dev set
1445850it [00:00, 16358657.93it/s]      
- dev UAS: 88.15

Epoch 8 out of 10
100%|██████████| 1848/1848 [02:12<00:00, 13.92it/s]
Average Train Loss: 0.06480039135468277
Evaluating on dev set
1445850it [00:00, 20698658.75it/s]      
- dev UAS: 88.45
New best dev UAS! Saving model.

Epoch 9 out of 10
100%|██████████| 1848/1848 [02:31<00:00, 12.22it/s]
Average Train Loss: 0.061141976250085606
Evaluating on dev set
1445850it [00:00, 22635715.12it/s]      
- dev UAS: 88.41

Epoch 10 out of 10
100%|██████████| 1848/1848 [02:18<00:00, 13.36it/s]
Average Train Loss: 0.05778654704870277
Evaluating on dev set
1445850it [00:00, 30163164.76it/s]      
- dev UAS: 88.60
New best dev UAS! Saving model.
================================================================================
TESTING
================================================================================
Restoring the best model weights found on the dev set
Final evaluation on test set
2919736it [00:00, 31476695.98it/s]      
- test UAS: 89.08
Done!

Reference

(1)详解Transition-based Dependency parser基于转移的依存句法解析器
(2)斯坦福大学CS224N课程作业

上一篇:Maven 依赖调解源码解析(一):开篇


下一篇:2021云栖大会