python 任意新闻正文提取

2022-03-31 07:11:59

在github上搜到一个正文提取程序，测试了一下基本可以对现在大多数大型新闻网站进行提取

后续我会分析一下这个程序的源码

使用非常简单如下

# -*- coding: utf-8 -*-
import newspaper
url =  'http://news.haiwainet.cn/n/2015/0611/c3541083-28826526.html'
a = newspaper.Article(url,language='zh')
a.download()
a.parse()
print(a.text)

github：https://github.com/codelucas/newspaper

码农公寓

相关文章