文本摘要的评价rouge
Rouge(Recall-Oriented Understudy for Gisting Evaluation),是评估自动文摘以及机器翻译的一组指标。它通过将自动生成的摘要或翻译与一组参考摘要(通常是人工生成的)进行比较计算,得出相应的分值,以衡量自动生成的摘要或翻译与参考摘要之间的“相似度”
可以有Rouge-1、Rouge-2、Rouge-N的形式
Rouge-1
先来看最简单的rouge-1
target为:
the fox jumps over
模型预测结果为:
the hello cat dog fox jumps
模型预测结果共有6个单词。模型跟target共有单词个数为3 the fox jumps
terget 单词数为4
- 灵敏度\精确率(Precision)为3/6 = 0.5
- 召回率 (recall)3/4 = 0.75
- F1 = (2PR)/(P+R) =
(2*0.5*0.75)/(0.5+0.75)
=0.6
代码:
from rouge import Rouge
model_out = 'the hello cat dog fox jumps'
reference = ' the fox jumps over'
rouge = Rouge()
rouge.get_scores(model_out, reference)
[{'rouge-1': {'f': 0.5999999952, 'p': 0.5, 'r': 0.75},
'rouge-2': {'f': 0.24999999531250006, 'p': 0.2, 'r': 0.3333333333333333},
'rouge-l': {'f': 0.5999999952, 'p': 0.5, 'r': 0.75}}]
ROUGE-2
把两个相邻的单词作为一个整体来评测
target | model_out |
---|---|
the fox | the hello |
fox jumps | hello cat |
jumps over | cat dog |
dog fox | |
fox jumps |
正确单词只有一个fox jumps。目标值有3个单词,模型预测有5个单词
- 灵敏度 1/5 = 0.2
- 召回率 1/3 = 0.333
- F1 = 0.25
同理可以推广到rouge-n,但是这个包只有计算rouge-1和rouge-2和rouge-l
数据集的计算
model_out = ["he began by starting a five person war cabinet and included chamberlain as lord president of the council",
"the siege lasted from 250 to 241 bc, the romans laid siege to lilybaeum",
"the original ocean water was found in aquaculture"]
reference = ["he began his premiership by forming a five-man war cabinet which included chamberlain as lord president of the council",
"the siege of lilybaeum lasted from 250 to 241 bc, as the roman army laid siege to the carthaginian-held sicilian city of lilybaeum",
"the original mission was for research into the uses of deep ocean water in ocean thermal energy conversion (otec) renewable energy production and in aquaculture"]
rouge = Rouge()
rouge.get_scores(model_out, reference,avg=True)
{'rouge-1': {'f': 0.6279006234427593,
'p': 0.8604497354497355,
'r': 0.5273531655225019},
'rouge-2': {'f': 0.3883256484545606,
'p': 0.5244559362206421,
'r': 0.32954545454545453},
'rouge-l': {'f': 0.6282785202429159,
'p': 0.8122895622895623,
'r': 0.5369305616983636}}