pytorch pack_padded_sequence和pad_packed_sequence

问题

当我们进行batch个训练数据一起计算的时候,我们会遇到多个训练样例长度不同的情况,这样我们就会很自然的进行padding,将短句子padding为跟最长的句子一样。
pytorch pack_padded_sequence和pad_packed_sequence
问题是,句子“Yes”只有一个单词,但是padding了5的pad符号,这样会导致LSTM对它的表示通过了非常多无用的字符,这样得到的句子表示就会有误差。
pytorch pack_padded_sequence和pad_packed_sequence

RNN对变长序列的处理

主要是用函数torch.nn.utils.rnn.pack_padded_sequence()和torch.nn.utils.rnn.pad_packed_sequence(),中间涉及到一个类torch.nn.utils.rnn.PackedSequence。

一开始会分不清两个函数,从字面上去理解:

step 1

.pack_padded_sequence(),padded:变成序列已经padding好,pack:对这样的序列进行打包。函数的参数:

input (Tensor) – padded batch of variable length sequences.
lengths (Tensor) – list of sequences lengths of each batch element.
batch_first (bool, optional) – if True, the input is expected in B x T x * format.
enforce_sorted (bool, optional) – if True, the input is expected to contain sequences sorted by length in a decreasing order. If False, the input will get sorted unconditionally. Default: True.

lengths,batch里面每个序列的长度,因为知道了每个序列的长度,才能知道每个序列处理到多长停止。
batch_first ,常见的参数,规定batch在哪个维度
enforce_sorted ,当True才需要对序列根据长度来排序,input[:,0] should be the longest sequence, and input[:,B-1] the shortest one.

step 2

该函数会返回一个PackedSequence对象,
pytorch pack_padded_sequence和pad_packed_sequence
它的属性

PackedSequence.data (Tensor) – Tensor containing packed sequence
PackedSequence.batch_sizes (Tensor) – Tensor of integers holding information about the batch size at each sequence step
PackedSequence.sorted_indices (Tensor, optional) – Tensor of integers holding how this PackedSequence is constructed from sequences.
PackedSequence.unsorted_indices (Tensor, optional) – Tensor of integers holding how this to recover the original sequences with correct order.

batch_sizes,这里不是每个序列的长度,而是每个time step要处理序列的数量。如图,第一个时间处理的batch为5,第三个时间处理的batch为3,。。。
sorted_indices ,从无序到降序(又长到短)的索引
unsorted_indices ,从排序恢复到原来序列的索引

看个例子就明白:

from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
>>> seq = torch.tensor([[1,2,0], [3,0,0], [4,5,6]])
>>> lens = [2, 1, 3]
>>> packed = pack_padded_sequence(seq, lens, batch_first=True, enforce_sorted=False)
>>> packed
PackedSequence(data=tensor([4, 1, 3, 5, 2, 6]), batch_sizes=tensor([3, 2, 1]),
               sorted_indices=tensor([2, 0, 1]), unsorted_indices=tensor([1, 2, 0]))

pack_padded_sequence之后的数据对应上图是
[4,5,6]
[1,2,0]
[3,0,0]
batch_sizes,依次处理3,2,1个序列

细心的你可能发现RNN的数据有3个维度【batch,seq_len,hidden_size】
原因:

input can be of size T x B x * where T is the length of the longest sequence (equal to lengths[0]), B is the batch size, and * is any number of dimensions (including 0).

step 3

embed_input_x_packed = pack_padded_sequence(embed_input_x, sentence_lens, batch_first=True)
encoder_outputs_packed, (h_last, c_last) = self.lstm(embed_input_x_packed)

返回的h_last和c_last就是剔除padding字符后的hidden state和cell state,都是Variable类型的。(各个句子的表示,lstm只会作用到它实际长度的句子,而不是通过无用的padding字符)
返回的output是PackedSequence类型的,那么现在要同一长度作后续处理。

.pad_packed_sequence()这个操作和pack_padded_sequence()是相反的。把压紧的序列再填充回来。函数的参数:

sequence (PackedSequence) – batch to pad
batch_first (bool, optional) – if True, the output will be in B x T x * format.
padding_value (float, optional) – values for padded elements.
total_length (int, optional) – if not None, the output will be padded to have length total_length. This method will throw ValueError if total_length is less than the max sequence length in sequence.

padding_value默认值0.0

函数的返回:

Tuple of Tensor containing the padded sequence, and a Tensor containing the list of lengths of each sequence in the batch. Batch elements will be re-ordered as they were ordered originally when the batch was passed to pack_padded_sequence or pack_sequence.

返回两个值(1)padded sequence,同一长度。而且序列从降序变为原来序列(2)每个序列的真实长度

>>> seq_unpacked, lens_unpacked = pad_packed_sequence(packed, batch_first=True)
>>> seq_unpacked
tensor([[1, 2, 0],
        [3, 0, 0],
        [4, 5, 6]])
>>> lens_unpacked
tensor([2, 1, 3])

通过具体代码

这里引用一位博主

import torch
input_tensor = torch.tensor([[1, 3, 5, 6, 2, 0, 0],
                             [1, 3, 5, 0, 0, 0, 0],
                             [1, 3, 0, 0, 0, 0, 0]])
embe = torch.nn.Embedding(10, 6)
out = embe(input_tensor)
print(out.shape)
 
# 结果
#torch.Size([3, 7, 6])

gru = torch.nn.GRU(6, 8, batch_first=True)
 
hidden_normal = torch.zeros(1, 3, 8)
output_normal, _ = gru(out, hidden_normal)
print(output_normal.shape)
print(output_normal)
 
 
# 结果
torch.Size([3, 7, 8])
tensor([[[-0.3121,  0.0188, -0.1041, -0.1437, -0.4423,  0.2555,  0.3690,
           0.2136],
         [ 0.1832, -0.2063, -0.0339, -0.3196, -0.6962,  0.2769,  0.3495,
           0.0115],
         [ 0.3326, -0.3881,  0.0615, -0.2771, -0.4755,  0.2857,  0.3597,
          -0.4412],
         [ 0.1384, -0.0065,  0.2262, -0.4853, -0.6944, -0.0467,  0.5761,
          -0.3320],
         [ 0.2038,  0.0938, -0.1772, -0.4974, -0.5730, -0.3191,  0.6605,
          -0.3210],
         [ 0.2620,  0.1287, -0.4169, -0.4849, -0.5390, -0.4803,  0.6889,
          -0.2553],
         [ 0.3085,  0.1449, -0.5499, -0.4641, -0.5323, -0.5623,  0.6914,
          -0.1874]],
 
        [[-0.3121,  0.0188, -0.1041, -0.1437, -0.4423,  0.2555,  0.3690,
           0.2136],
         [ 0.1832, -0.2063, -0.0339, -0.3196, -0.6962,  0.2769,  0.3495,
           0.0115],
         [ 0.3326, -0.3881,  0.0615, -0.2771, -0.4755,  0.2857,  0.3597,
          -0.4412],
         [ 0.3520, -0.0209, -0.2198, -0.3820, -0.4272, -0.1440,  0.5585,
          -0.3779],
         [ 0.3638,  0.0925, -0.4146, -0.4223, -0.4525, -0.3903,  0.6477,
          -0.2921],
         [ 0.3740,  0.1334, -0.5358, -0.4313, -0.4838, -0.5156,  0.6777,
          -0.2139],
         [ 0.3830,  0.1506, -0.6060, -0.4272, -0.5031, -0.5743,  0.6832,
          -0.1545]],
 
        [[-0.3121,  0.0188, -0.1041, -0.1437, -0.4423,  0.2555,  0.3690,
           0.2136],
         [ 0.1832, -0.2063, -0.0339, -0.3196, -0.6962,  0.2769,  0.3495,
           0.0115],
         [ 0.2611,  0.0369, -0.2880, -0.3972, -0.4885, -0.1832,  0.5467,
          -0.1531],
         [ 0.3099,  0.1091, -0.4588, -0.4242, -0.4651, -0.4192,  0.6352,
          -0.1683],
         [ 0.3431,  0.1368, -0.5622, -0.4278, -0.4822, -0.5314,  0.6664,
          -0.1414],
         [ 0.3657,  0.1500, -0.6208, -0.4224, -0.4980, -0.5815,  0.6743,
          -0.1111],
         [ 0.3810,  0.1572, -0.6524, -0.4151, -0.5059, -0.6024,  0.6746,
          -0.0875]]], grad_fn=<TransposeBackward1>)

hidden = torch.zeros(1, 3, 8)
out_pad = torch.nn.utils.rnn.pack_padded_sequence(out, torch.tensor([4, 3, 2]), batch_first=True)
output, _ = gru(out_pad, hidden)
encoder_outputs, _ = torch.nn.utils.rnn.pad_packed_sequence(output, batch_first=True)
#以最大长度(非补齐输入序列)补齐输出长度。
print(encoder_outputs.shape)
print(encoder_outputs)
 
 
# 结果
torch.Size([3, 4, 8])
tensor([[[-0.3147,  0.2937,  0.3170,  0.0374,  0.0856,  0.1972,  0.1793,
          -0.1815],
         [-0.1413, -0.2737,  0.4023, -0.0043, -0.1145,  0.0961,  0.0909,
          -0.1149],
         [-0.2327,  0.0745,  0.5349,  0.0076,  0.1540,  0.1582,  0.2454,
          -0.2582],
         [-0.1467, -0.2010,  0.4935,  0.0996, -0.3427,  0.2260,  0.0455,
           0.0056]],
 
        [[-0.3147,  0.2937,  0.3170,  0.0374,  0.0856,  0.1972,  0.1793,
          -0.1815],
         [-0.1413, -0.2737,  0.4023, -0.0043, -0.1145,  0.0961,  0.0909,
          -0.1149],
         [-0.2327,  0.0745,  0.5349,  0.0076,  0.1540,  0.1582,  0.2454,
          -0.2582],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
           0.0000]],
 
        [[-0.3147,  0.2937,  0.3170,  0.0374,  0.0856,  0.1972,  0.1793,
          -0.1815],
         [-0.1413, -0.2737,  0.4023, -0.0043, -0.1145,  0.0961,  0.0909,
          -0.1149],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
           0.0000],
         [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,
           0.0000]]], grad_fn=<TransposeBackward0>)

encoder

import torch
import torch.nn as nn
import torch.nn.functional as F
class Encoder(nn.Module):
    def __init__(self, input_size, hidden_size, n_layer=1, drop_out=0):
        # input_size是指单词数量,hidden_size为gru的hidden的feature
        super(Encoder, self).__init__()
        self.hidden_size = hidden_size
        self.embeddding = nn.Embedding(input_size, self.hidden_size)
        self.gru = nn.GRU(self.hidden_size, self.hidden_size, bidirectional=True,
                          num_layers=n_layer, dropout=(0 if n_layer == 1 else drop_out))
 
    def forward(self, input_seq, length, hidden=None):
        # input_seq应该为[seq_len, batch]
        # length为input_seq未补齐时的真实长度排序,最大的在前,list(int)
        # embedd.shape = [seq_len, batch, hidden_size]
        embedd = self.embeddding(input_seq)
        pack = torch.nn.utils.rnn.pack_padded_sequence(embedd, length)
        output, hidden = self.gru(pack, hidden)
        output, _ = torch.nn.utils.rnn.pad_packed_sequence(output)
 
        # encoder_output的shape=[max(length), batch, hidden_size]
        # hidden_output的shape=[2, batch, hidden_size]
        encoder_output = output[:, :, :self.hidden_size] + output[:, :, self.hidden_size:]
 
        return encoder_output, hidden
上一篇:USACO 262144


下一篇:6612. Cow Dating