RNN实现情感分类
概述
情感分类是自然语言处理中的经典任务,是典型的分类问题。本节使用MindSpore实现一个基于RNN网络的情感分类模型,实现如下的效果:
输入: This film is terrible
正确标签: Negative
预测标签: Negative
输入: This film is great
正确标签: Positive
预测标签: Positive
数据准备
本节使用情感分类的经典数据集IMDB影评数据集,数据集包含Positive和Negative两类,下面为其样例:
Review | Label |
---|---|
"Quitting" may be as much about exiting a pre-ordained identity as about drug withdrawal. As a rural guy coming to Beijing, class and success must have struck this young artist face on as an appeal to separate from his roots and far surpass his peasant parents' acting success. Troubles arise, however, when the new man is too new, when it demands too big a departure from family, history, nature, and personal identity. The ensuing splits, and confusion between the imaginary and the real and the dissonance between the ordinary and the heroic are the stuff of a gut check on the one hand or a complete escape from self on the other. | Negative |
This movie is amazing because the fact that the real people portray themselves and their real life experience and do such a good job it's like they're almost living the past over again. Jia Hongsheng plays himself an actor who quit everything except music and drugs struggling with depression and searching for the meaning of life while being angry at everyone especially the people who care for him most. | Positive |
此外,需要使用预训练词向量对自然语言单词进行编码,以获取文本的语义特征,本节选取Glove词向量作为Embedding。
数据下载模块
为了方便数据集和预训练词向量的下载,首先设计数据下载模块,实现可视化下载流程,并保存至指定路径。数据下载模块使用requests
库进行http请求,并通过tqdm
库对下载百分比进行可视化。此外针对下载安全性,使用IO的方式下载临时文件,而后保存至指定的路径并返回。
------------------------
以上为今天学习内容简介,下面是运行代码训练的结果。