进阶版包含以下技术:
-
Recurrent dropout(循环 dropout), a specific, built-in way to use dropout to fight overfitting in recurrent layers.
- 使用 dropout 正则化的网络需要更长的时间才能完全收敛,因此网络训练轮次要增加为原来的 2 倍。
-
Stacking recurrent layers(堆叠循环层), to increase the representational power of the network (at the cost of higher computational loads).
- 在 keras 中逐个堆叠循环层,所有中间层都应该返回完整的输出序列(一个 3D 张量),而不是只返回最后一个时间步的输出。这个可以指定 return_sequences=True 来实现。
-
Bidirectional recurrent layers(双向循环层), which presents the same information to a recurrent network in different ways, increasing accuracy and mitigating forgetting issues.
- 需要使用 Bidirectional 层,它的第一个参数是一个循环层实例。Bidirectional 层对这个循环层创建了第二个单独实例,然后使用一个实例按正序处理输入序列,另一个实例按逆序处理输入序列。
1. Bidirectional 层
1.1 语法
keras.layers.Bidirectional(layer, merge_mode='concat', weights=None)
1.2 参数
-
layer:
Recurrent
实例。 - merge_mode: 前向和后向 RNN 的输出的结合模式。 为 {'sum', 'mul', 'concat', 'ave', None} 其中之一。 如果是 None,输出不会被结合,而是作为一个列表被返回。
2. 举例
2.1 Recurrent dropout
from keras.layers import LSTM model = Sequential() model.add(Embedding(max_features, 32)) model.add(LSTM(32, dropout=0.2, recurrent_dropout=0.2)) model.add(Dense(1, activation='sigmoid')) model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc']) history = model.fit(input_train, y_train, epochs=20, batch_size=128, validation_split=0.2)
2.2 Stacking recurrent layers
from keras.layers import LSTM model = Sequential() model.add(Embedding(max_features, 32)) model.add(LSTM(32, dropout=0.2, recurrent_dropout=0.2, return_sequences=True)) model.add(LSTM(32, dropout=0.2, recurrent_dropout=0.2)) model.add(Dense(1, activation='sigmoid')) model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc']) history = model.fit(input_train, y_train, epochs=20, batch_size=128, validation_split=0.2)
2.3 Bidirectional recurrent layers
from keras.layers import LSTM from keras.layers import Bidirectional model = Sequential() model.add(Embedding(max_features, 32)) model.add(Bidirectional(LSTM(32, dropout=0.2, recurrent_dropout=0.2, return_sequences=True))) model.add(Bidirectional(LSTM(32, dropout=0.2, recurrent_dropout=0.2))) model.add(Dense(1, activation='sigmoid')) model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['acc']) history = model.fit(input_train, y_train, epochs=5, batch_size=128, validation_split=0.2)