作者: 明天依旧可好
QQ交流群: 807041986
注:关于深度学习的相关问题,若本文未涉及可在下方留言告诉我,我会在文章中进行补充的。
原文链接:https://mtyjkh.blog.csdn.net/article/details/111088248
深度学习系列:深度学习(TensorFlow 2)简单入门
代码|数据: 微信公众号(明天依旧可好)中回复:深度学习
导入数据
import pandas as pd
import tensorflow as tf
import os
df = pd.read_csv("Tweets.csv",usecols=["airline_sentiment","text"])
df
# categorical 实际上是计算一个列表型数据中的类别数,即不重复项,
# 它返回的是一个CategoricalDtype 类型的对象,相当于在原来数据上附加上类别信息 ,
# 具体的类别可以通过和对应的序号可以通过 codes 和 categories
df.airline_sentiment = pd.Categorical(df.airline_sentiment).codes
df
建立词汇表
import tensorflow_datasets as tfds
import os
tokenizer = tfds.features.text.Tokenizer()
vocabulary_set = set()
for text in df["text"]:
some_tokens = tokenizer.tokenize(text)
vocabulary_set.update(some_tokens)
vocab_size = len(vocabulary_set)
vocab_size
'''
输出:
18027
'''
样本编码(测试)
encoder = tfds.features.text.TokenTextEncoder(vocabulary_set)
encoded_example = encoder.encode(text)
print(encoded_example)
'''
text为:
'@AmericanAir we have 8 ppl so we need 2 know how many seats are on the next flight. Plz put us on standby for 4 people on the next flight?'
输出:
[12939, 13052, 13579, 11267, 14825, 8674, 13052, 12213, 12082, 12156, 5329, 5401, 10099, 3100, 7974, 7804, 5671, 2947, 9873, 7864, 9704, 7974, 3564, 11759, 15266, 11250, 7974, 7804, 5671, 2947]
'''
将文本编码成数字形式
df["encoded_text"] = [encoder.encode(text) for text in df["text"]]
df
train_x = df["encoded_text"][:10000]
train_y = df["airline_sentiment"][:10000]
test_x = df["encoded_text"][10000:]
test_y = df["airline_sentiment"][10000:]
from tensorflow import keras
train_x = keras.preprocessing.sequence.pad_sequences(train_x,maxlen=50)
test_x = keras.preprocessing.sequence.pad_sequences(test_x,maxlen=50)
train_x.shape,train_y.shape,test_x.shape,test_y.shape
'''
输出:
((10000, 50), (10000,), (4640, 50), (4640,))
'''
构建模型
import tensorflow as tf
model = tf.keras.Sequential([
tf.keras.layers.Embedding(vocab_size+1, 64),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])
model.summary()
'''
输出:
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
embedding_2 (Embedding) (None, None, 64) 1153792
_________________________________________________________________
bidirectional_2 (Bidirection (None, 128) 66048
_________________________________________________________________
dense_4 (Dense) (None, 64) 8256
_________________________________________________________________
dense_5 (Dense) (None, 1) 65
=================================================================
Total params: 1,228,161
Trainable params: 1,228,161
Non-trainable params: 0
_________________________________________________________________
'''
激活
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=True),
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=['accuracy'])
训练模型
history = model.fit(train_x,
train_y,
epochs=20,
batch_size=200,
validation_data=(test_x, test_y),
verbose=1)
'''
输出:
Epoch 1/20
50/50 [==============================] - 6s 117ms/step - loss: -4.8196 - accuracy: 0.6652 - val_loss: -0.7605 - val_accuracy: 0.7071
......
Epoch 19/20
50/50 [==============================] - 6s 123ms/step - loss: -37.5176 - accuracy: 0.7586 - val_loss: -9.0619 - val_accuracy: 0.7272
Epoch 20/20
50/50 [==============================] - 6s 120ms/step - loss: -40.0017 - accuracy: 0.7611 - val_loss: -7.7479 - val_accuracy: 0.7248
'''