+-----------------------------------------+---------------------------------------------------------------------------------------------------------
| Tag + 标签,最基本的信息组织单元,分别用<>和</>标明开头和结尾 |
| Name + 标签的名字,<P>...........</p>的名字是‘p’, 格式:<tag>.name |
| Attributes + 标签的属性,字典形式组织,格式:,<tag>.attrs |
| NavigableString + 标签内非属性字符串,<>......</>中字符串,格式:<tag>.string |
| Comment + 标签内字符串的注释部分,一种特殊的Comment类型 |
+-----------------------------------------+---------------------------------------------------------------------------------------------------------
from bs4 import BeautifulSoup import requests r = requests.get('https://python123.io/ws/demo.html') demo = r.text soup = BeautifulSoup(demo,'html.parser') print(soup.title) tag = soup.a print(tag) #获取标签的名字 name print(soup.a.parent.parent.name) #标签的属性信息 attrs print(tag.attrs['class']) print(tag.attrs['href']) print(type(tag)) #获取标签内的字符串 print(soup.p) print(soup.p.string) print(type(soup.p.string)) # 对html注释的处理,当打印type时结果为comment时为有字符串 newsoup = BeautifulSoup("<b><!--This is a comment--></b><p>This is not a comment<p>" ,'html.parser') print(newsoup.b.string) print(type(newsoup.b.string)) print(type(newsoup.p.string))