1. 标签文本预处理
使用 <PAD> 做标签对齐补齐,使用<GO>和<EOS>对标签做起始和结束标志,用于告诉decoder文本起始与结束,对于某些低频词汇和不关心词汇使用<UNK>标签替代.
2.seq2seq api
- encoder部分
通常使用 lstm 或 gru 进行编码,在其前可以添加cnn做特征抽取
- decoder部分
主要函数是TrainingHelper, GreedyEmbeddingHelper, BasicDecoder, dynamic_decode
TrainingHelper 用于训练时
class TrainingHelper(Helper):
"""A helper for use during training. Only reads inputs.
Returned sample_ids are the argmax of the RNN output logits.
"""
def __init__(self, inputs, sequence_length, time_major=False, name=None):
"""Initializer.
Args:
inputs: A (structure of) input tensors.
sequence_length: An int32 vector tensor.
time_major: Python bool. Whether the tensors in `inputs` are time major.
If `False` (default), they are assumed to be batch major.
name: Name scope for any created operations.
Raises:
ValueError: if `sequence_length` is not a 1D tensor.
"""
input: 上一个时刻的 , 对于第一个解码,输入是 <GO> 对应的的字典下标的向量,经过 embedding 后的矩阵
sequence_length: decoder 目标的长度
GreedyEmbeddingHelper 用于预测时
class GreedyEmbeddingHelper(Helper):
"""A helper for use during inference.
Uses the argmax of the output (treated as logits) and passes the
result through an embedding layer to get the next input.
"""
def __init__(self, embedding, start_tokens, end_token):
"""Initializer.
Args:
embedding: A callable that takes a vector tensor of `ids` (argmax ids),
or the `params` argument for `embedding_lookup`. The returned tensor
will be passed to the decoder input.
start_tokens: `int32` vector shaped `[batch_size]`, the start tokens.
end_token: `int32` scalar, the token that marks end of decoding.
Raises:
ValueError: if `start_tokens` is not a 1D tensor or `end_token` is not a
scalar.
"""
input: 是embedding后的词向量库, callable, 每个上一时刻的预测值均会经过这个 embedding 转化为词向量
start_tokens: 开始的下标向量, 即 <GO> 在字典中的下标 index 的常量向量
end_token: 结束标志的下标值
BasicDecoder 解码函数
class BasicDecoder(decoder.Decoder):
"""Basic sampling decoder."""
def __init__(self, cell, helper, initial_state, output_layer=None):
"""Initialize BasicDecoder.
Args:
cell: An `RNNCell` instance.
helper: A `Helper` instance.
initial_state: A (possibly nested tuple of...) tensors and TensorArrays.
The initial state of the RNNCell.
output_layer: (Optional) An instance of `tf.layers.Layer`, i.e.,
`tf.layers.Dense`. Optional layer to apply to the RNN output prior
to storing the result or sampling.
Raises:
TypeError: if `cell`, `helper` or `output_layer` have an incorrect type.
"""
cell: rnn单元, 如果有attention机制, 就是经过attention warpper后的cell
helper: 前面两个helper中的一个
initial_state: 初始状态, 如果没有 attention 机制, 就是 encoder 的输出状态; 如果有 attention 机制, 就是 dec_cell.zero_state(batch_size, dtype=tf.float32), dec_cell就是attention warpper后的cell
output_layer: 通常是dense层
tf.contrib.seq2seq.dynamic_decode 动态的解码器, 可以解码不同长度的输入,但每个batch的解码长度必须相同
def dynamic_decode(decoder,
output_time_major=False,
impute_finished=False,
maximum_iterations=None,
parallel_iterations=32,
swap_memory=False,
scope=None):
"""Perform dynamic decoding with `decoder`.
Calls initialize() once and step() repeatedly on the Decoder object.
Args:
decoder: A `Decoder` instance.
output_time_major: Python boolean. Default: `False` (batch major). If
`True`, outputs are returned as time major tensors (this mode is faster).
Otherwise, outputs are returned as batch major tensors (this adds extra
time to the computation).
impute_finished: Python boolean. If `True`, then states for batch
entries which are marked as finished get copied through and the
corresponding outputs get zeroed out. This causes some slowdown at
each time step, but ensures that the final state and outputs have
the correct values and that backprop ignores time steps that were
marked as finished.
maximum_iterations: `int32` scalar, maximum allowed number of decoding
steps. Default is `None` (decode until the decoder is fully done).
parallel_iterations: Argument passed to `tf.while_loop`.
swap_memory: Argument passed to `tf.while_loop`.
scope: Optional variable scope to use.
Returns:
`(final_outputs, final_state, final_sequence_lengths)`.
Raises:
TypeError: if `decoder` is not an instance of `Decoder`.
ValueError: if `maximum_iterations` is provided but is not a scalar.
"""
- attention 部分
主要对象是BahdanauAttention, LuongAttention , AttentionWrapper
BahdanauAttention attention机制类型
class BahdanauAttention(_BaseAttentionMechanism):
def __init__(self,
num_units,
memory,
memory_sequence_length=None,
normalize=False,
probability_fn=None,
score_mask_value=None,
dtype=None,
name="BahdanauAttention"):
"""Construct the Attention mechanism.
Args:
num_units: The depth of the query mechanism.
memory: The memory to query; usually the output of an RNN encoder. This
tensor should be shaped `[batch_size, max_time, ...]`.
memory_sequence_length (optional): Sequence lengths for the batch entries
in memory. If provided, the memory tensor rows are masked with zeros
for values past the respective sequence lengths.
normalize: Python boolean. Whether to normalize the energy term.
probability_fn: (optional) A `callable`. Converts the score to
probabilities. The default is `tf.nn.softmax`. Other options include
`tf.contrib.seq2seq.hardmax` and `tf.contrib.sparsemax.sparsemax`.
Its signature should be: `probabilities = probability_fn(score)`.
score_mask_value: (optional): The mask value for score before passing into
`probability_fn`. The default is -inf. Only used if
`memory_sequence_length` is not None.
dtype: The data type for the query and memory layers of the attention
mechanism.
name: Name to use when creating ops.
"""
num_units: attention 机制的 dense 单元数量
memory: encoder 的 输出
memory_sequence_length: target的长度, 用于计算mask
AttentionWrapper 给 decoder 的cell 应用定义的 attention 机制
class AttentionWrapper(rnn_cell_impl.RNNCell):
"""Wraps another `RNNCell` with attention.
"""
def __init__(self,
cell,
attention_mechanism,
attention_layer_size=None,
alignment_history=False,
cell_input_fn=None,
output_attention=True,
initial_cell_state=None,
name=None,
attention_layer=None):
"""Construct the `AttentionWrapper`.
**NOTE** If you are using the `BeamSearchDecoder` with a cell wrapped in
`AttentionWrapper`, then you must ensure that:
- The encoder output has been tiled to `beam_width` via
`tf.contrib.seq2seq.tile_batch` (NOT `tf.tile`).
- The `batch_size` argument passed to the `zero_state` method of this
wrapper is equal to `true_batch_size * beam_width`.
- The initial state created with `zero_state` above contains a
`cell_state` value containing properly tiled final state from the
encoder.
An example:
```
tiled_encoder_outputs = tf.contrib.seq2seq.tile_batch(
encoder_outputs, multiplier=beam_width)
tiled_encoder_final_state = tf.conrib.seq2seq.tile_batch(
encoder_final_state, multiplier=beam_width)
tiled_sequence_length = tf.contrib.seq2seq.tile_batch(
sequence_length, multiplier=beam_width)
attention_mechanism = MyFavoriteAttentionMechanism(
num_units=attention_depth,
memory=tiled_inputs,
memory_sequence_length=tiled_sequence_length)
attention_cell = AttentionWrapper(cell, attention_mechanism, ...)
decoder_initial_state = attention_cell.zero_state(
dtype, batch_size=true_batch_size * beam_width)
decoder_initial_state = decoder_initial_state.clone(
cell_state=tiled_encoder_final_state)
```
cell: decoder 的 rnn 单元
attention_mechanism: 定义的 attention 对象
示例
encoder
def get_encoder_layer(input_data, rnn_size, num_layers,
source_sequence_length, source_vocab_size,
encoding_embedding_size):
'''
构造Encoder层
参数说明:
- input_data: 输入tensor
- rnn_size: rnn隐层结点数量
- num_layers: 堆叠的rnn cell数量
- source_sequence_length: 源数据的序列长度
- source_vocab_size: 源数据的词典大小
- encoding_embedding_size: embedding的大小
'''
# Encoder embedding
encoder_embed_input = tf.contrib.layers.embed_sequence(input_data, source_vocab_size, encoding_embedding_size)
# RNN cell
def get_lstm_cell(rnn_size):
lstm_cell = tf.contrib.rnn.LSTMCell(rnn_size, initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
return lstm_cell
cell = tf.contrib.rnn.MultiRNNCell([get_lstm_cell(rnn_size) for _ in range(num_layers)])
encoder_output, encoder_state = tf.nn.dynamic_rnn(cell, encoder_embed_input,
sequence_length=source_sequence_length, dtype=tf.float32)
return encoder_output, encoder_state
decoder
def decoding_layer(target_letter_to_int, decoding_embedding_size, num_layers, rnn_size,
target_sequence_length, max_target_sequence_length, encoder_output, encoder_state, decoder_input):
'''
构造Decoder层
参数:
- target_letter_to_int: target数据的映射表
- decoding_embedding_size: embed向量大小
- num_layers: 堆叠的RNN单元数量
- rnn_size: RNN单元的隐层结点数量
- target_sequence_length: target数据序列长度
- max_target_sequence_length: target数据序列最大长度
- encoder_state: encoder端编码的状态向量
- decoder_input: decoder端输入
'''
# 1. Embedding
target_vocab_size = len(target_letter_to_int)
decoder_embeddings = tf.Variable(tf.random_uniform([target_vocab_size, decoding_embedding_size]))
decoder_embed_input = tf.nn.embedding_lookup(decoder_embeddings, decoder_input)
# 2. 构造Decoder中的RNN单元
def get_decoder_cell(rnn_size):
decoder_cell = tf.contrib.rnn.LSTMCell(rnn_size,
initializer=tf.random_uniform_initializer(-0.1, 0.1, seed=2))
return decoder_cell
cell = tf.contrib.rnn.MultiRNNCell([get_decoder_cell(rnn_size) for _ in range(num_layers)])
# 定义 attention 机制
attn_mech = tf.contrib.seq2seq.BahdanauAttention(rnn_size, encoder_output, target_sequence_length, normalize=False,
name='BahdanauAttention')
# 应用 attention 机制
dec_cell = tf.contrib.seq2seq.AttentionWrapper(cell=cell, attention_mechanism=attn_mech)
# 3. Output全连接层
output_layer = Dense(target_vocab_size,
kernel_initializer=tf.truncated_normal_initializer(mean=0.0, stddev=0.1))
# 4. Training decoder
with tf.variable_scope("decode"):
# 得到help对象
training_helper = tf.contrib.seq2seq.TrainingHelper(inputs=decoder_embed_input,
sequence_length=target_sequence_length,
time_major=False)
# 构造decoder
training_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
training_helper,
dec_cell.zero_state(batch_size, dtype=tf.float32)
.clone(cell_state=encoder_state),
output_layer)
training_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(training_decoder,
impute_finished=True,
maximum_iterations=max_target_sequence_length)
# 5. Predicting decoder
# 与training共享参数
with tf.variable_scope("decode", reuse=True):
# 创建一个常量tensor并复制为batch_size的大小
start_tokens = tf.tile(tf.constant([target_letter_to_int['<GO>']], dtype=tf.int32), [batch_size],
name='start_tokens')
predicting_helper = tf.contrib.seq2seq.GreedyEmbeddingHelper(decoder_embeddings,
start_tokens,
target_letter_to_int['<EOS>'])
predicting_decoder = tf.contrib.seq2seq.BasicDecoder(dec_cell,
predicting_helper,
dec_cell.zero_state(batch_size, dtype=tf.float32)
.clone(cell_state=encoder_state),
output_layer)
predicting_decoder_output, _, _ = tf.contrib.seq2seq.dynamic_decode(predicting_decoder,
impute_finished=True,
maximum_iterations=max_target_sequence_length)
return training_decoder_output, predicting_decoder_output