TextGrocery中文文本分类处理

2022-10-27 18:11:34

详细使用说明：http://textgrocery.readthedocs.io/zh/latest/index.html

TextGrocery是一个基于LibLinear和结巴分词的短文本分类工具，特点是高效易用，同时支持中文和英文语料。

需要安装：

pip install classifier

过程：

>>> from tgrocery import Grocery

# 新开张一个杂货铺（别忘了取名）

>>> grocery = Grocery('sample')

# 训练文本可以用列表传入

>>> train_src = [

        ('education', '名师指导托福语法技巧：名词的复数形式'),

...     ('education', '中国高考成绩海外认可 是“狼来了”吗？'),

...     ('sports', '图文：法网孟菲尔斯苦战进16强 孟菲尔斯怒吼'),

...     ('sports', '四川丹棱举行全国长距登山挑战赛 近万人参与')

... ]

>>> grocery.train(train_src)

Building prefix dict from the default dictionary ...

Dumping model to file cache /tmp/jieba.cache

Loading model cost 1.125 seconds.

Prefix dict has been built succesfully.

*

optimization finished, #iter =

Objective value = -1.092381

nSV =

<tgrocery.Grocery object at 0x7f23cf243b50>

>>> grocery.save()

>>> new_grocery = Grocery('sample')

>>> new_grocery.load()

>>> new_grocery.predict('考生必读：新托福写作考试评分标准')

<tgrocery.base.GroceryPredictResult object at 0x4490d50>

>>> new_grocery.predict('考生必读：新托福写作考试评分标准')

<tgrocery.base.GroceryPredictResult object at 0x4490d90>

>>> result = new_grocery.predict('考生必读：新托福写作考试评分标准')

>>> print result

education

完毕。

码农公寓

相关文章