Beautiful Soup库基本元素

+-----------------------------------------+---------------------------------------------------------------------------------------------------------

|  Tag              +  标签,最基本的信息组织单元,分别用<>和</>标明开头和结尾        |

|  Name           +    标签的名字,<P>...........</p>的名字是‘p’,     格式:<tag>.name       |

|  Attributes          +  标签的属性,字典形式组织,格式:,<tag>.attrs              |

|  NavigableString      +  标签内非属性字符串,<>......</>中字符串,格式:<tag>.string        |

|  Comment         +  标签内字符串的注释部分,一种特殊的Comment类型            |

+-----------------------------------------+---------------------------------------------------------------------------------------------------------

from bs4 import BeautifulSoup
import requests
r = requests.get('https://python123.io/ws/demo.html')
demo = r.text
soup = BeautifulSoup(demo,'html.parser')
print(soup.title)
tag = soup.a
print(tag)


#获取标签的名字    name
print(soup.a.parent.parent.name)

#标签的属性信息     attrs
print(tag.attrs['class'])
print(tag.attrs['href'])
print(type(tag))

#获取标签内的字符串
print(soup.p)
print(soup.p.string)
print(type(soup.p.string))

# 对html注释的处理,当打印type时结果为comment时为有字符串
newsoup = BeautifulSoup("<b><!--This is a comment--></b><p>This is not a comment<p>" ,'html.parser')
print(newsoup.b.string)
print(type(newsoup.b.string))
print(type(newsoup.p.string))

 

上一篇:AttributeError: 'NoneType' object has no attribute 'children' 错误


下一篇:东方闪电苟富贵