文本内容:data(包含很多条文本)
1、分词:
import jieba data_cut = data.apply(jieba.lcut)
2、去除停用词:
stoplist.txt:链接:https://pan.baidu.com/s/1lN1J8aUFOwqXpYMzuqVA7w 提取码:nk7z
with open(r'D:\数据文件\stoplist.txt', encoding='utf-8') as f: txt = f.read() stop = txt.split() stop = stop + [' '] #把空格加进去 data_after = data_cut.apply( lambda x : [i for i in x if i not in stop] )
3、
from tkinter import _flatten tmp = pd.Series(_flatten(list(data_after))) #把二维变成一维 num = tmp.value_counts()
4、绘制
from wordcloud import WordCloud import matplotlib.pyplot as plt pic = plt.imread(r'D:\数据文件\aixin.jpg') wc = WordCloud( background_color='white', mask=pic, font_path=r'C:/Windows/Fonts/simsun.ttc') wc2 = wc.fit_words(num) plt.imshow(wc2) plt.axis('off') plt.show()