



我们要用的库是Huggingface实现的Transformers 。如果你不熟悉Transformers ,你可以继续阅读我之前的文章。


 pip install transformers


 from transformers import pipeline
 import os

 os.environ["CUDA_VISIBLE_DEVICES"] = "0"
现在,我们准备好选择要使用的摘要模型了。Huggingface提供两种强大的摘要模型使用:BART (BART -large-cnn)和t5 (t5-small, t5-base, t5-large, t5- 3b, t5- 11b)。你可以在他们的官方paper(BART paper, t5 paper)上了解更多。


 summarizer = pipeline("summarization")
如果你想使用t5模型(例如t5-base),它是在c4 Common Crawl web语料库进行预训练的,那么你可以这样做:

 summarizer = pipeline("summarization", model="t5-base", tokenizer="t5-base", framework="tf")

One month after the United States began what has become a troubled rollout  of a national COVID vaccination campaign, the effort is finally  gathering real steam.

Close to a million doses — over 951,000, to be more exact — made their way  into the arms of Americans in the past 24 hours, the U.S. Centers for  Disease Control and Prevention reported Wednesday. That’s the largest  number of shots given in one day since the rollout began and a big jump  from the previous day, when just under 340,000 doses were given, CBS News reported.

That number is likely to jump quickly after the federal government on  Tuesday gave states the OK to vaccinate anyone over 65 and said it would release all the doses of vaccine it has available for distribution.  Meanwhile, a number of states have now opened mass vaccination sites in  an effort to get larger numbers of people inoculated, CBS News reported.


 text = """One month after the United States began what has become a troubled rollout of a national COVID vaccination campaign, the effort is finally gathering real steam.
 Close to a million doses -- over 951,000, to be more exact -- made their way into the arms of Americans in the past 24 hours, the U.S. Centers for Disease Control and Prevention reported Wednesday. That's the largest number of shots given in one day since the rollout began and a big jump from the previous day, when just under 340,000 doses were given, CBS News reported.
 That number is likely to jump quickly after the federal government on Tuesday gave states the OK to vaccinate anyone over 65 and said it would release all the doses of vaccine it has available for distribution. Meanwhile, a number of states have now opened mass vaccination sites in an effort to get larger numbers of people inoculated, CBS News reported."""

 summary_text = summarizer(text, max_length=100, min_length=5, do_sample=False)[0]['summary_text']

Over 951,000 doses of vaccine given in one day in the past 24 hours, CDC says . That’s the largest number of shots given in a month since the  rollout began . The federal government gave states the OK to vaccinate  anyone over 65 on Tuesday . A number of states have now opened mass  vaccination sites in an effort to get more people inoculated, CBS News  reports .

从总结的文本中可以看出,该模型知道24小时相当于一天,并聪明地将美国疾病控制与预防中心(U.S. Centers for Disease Control and Prevention)缩写为CDC。此外,该模型成功地从第一段和第二段链接信息,指出这是自上个月开始展示以来给出的最大次数。我们可以看到,该摘要模型的性能相当不错。

最后把所有这些放在一起,这里是jupyter notebook形式的整个代码:


Lewis, Mike, et al. “Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.” arXiv preprint arXiv:1910.13461 (2019).

Raffel, Colin, et al. “Exploring the limits of transfer learning with a unified text-to-text transformer.” arXiv preprint arXiv:1910.10683 (2019).
版权声明:本文为CSDN博主「uoiqu90093jgj」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。


下一篇:linux typora+smms图床