bs4数据解析原理
- 实例化一个BeautifulSoup对象,并且将页面源码数据加载到该对象中
- 通过调用BeautifulSoup对象中相关的属性方法进行标签定位和数据提取
如何实例化BeautifulSoup对象:
- from bs4 import BeautifulSoup
- 对象实例化:将本地页面源码加载到BeautifulSoup中:
- fp=open('./test.html','r',encoding='utf-8')
- soup=BeautifulSoup(fp,'lxml')
- 对象实例化:将互联网页面源码加载到BeautifulSoup中:
- page_text=response.text
- soup=BeautifulSoup(page_text,'lxml')
代码示例
from bs4 import BeautifulSoup
fp=open('sougou.html','r',encoding='utf-8')
soup=BeautifulSoup(fp,'lxml')
# print(soup.a) #soup.TagName返回的是html中第一次出现的tagname标签
# print(soup.find('div',class_='single-share')) #属性定位
# print(soup.find_all('div',class_='single-share')) #返回所有
print(soup.select('.single-share>a')[0].text)