准备环境
安装依赖包
!which python
! pip install datasets transformers rouge-score nltk
# 加载数据
from datasets import load_dataset, load_metric
# raw_datasets = load_dataset("xsum")
metric = load_metric("rouge")
!pip install matplotlib
!pip install torch
!pip install torchtext
spacy是否支持中文?
支持。https://spacy.io/models
https://spacy.io/models/zh#zh_core_web_sm
下载之后,在当前python虚拟环境,pip install /Users/xuehuiping/Downloads/zh_core_web_sm-3.1.0.tar.gz
!pip install zh_core_web_sm-3.1.0.tar.gz
import spacy
nlp = spacy.load("zh_core_web_sm")
doc = nlp("庆祝祖国生日快乐")
print(doc.text)
for token in doc:
print(token.text, token.pos_, token.dep_)
庆祝祖国生日快乐
庆祝 VERB ROOT
祖国 NOUN compound:nn
生日 NOUN dobj
快乐 VERB ccomp
语言模型
https://pytorch.org/tutorials/beginner/transformer_tutorial.html
下载ipynb,运行
官方文档给了3次epoch
测试结果
保存模型
torch.save(best_model, 'best_model.pt')
加载模型
model = TransformerModel(ntokens, emsize, nhead, d_hid, nlayers, dropout).to(device)
torch.load('best_model.pt', map_location=torch.device('cpu'))