The Dormouse's story
Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.
...
""" from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') soup.prettify() print(soup) #得到下面结构化的html ""“The Dormouse's story
Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie ; and they lived at the bottom of a well.
...
""" ``` ## 过滤器 find_all()查找所有标签以列表形式返回 ### 字符串 ```python print(soup.find_all('b')) # [The Dormouse's story] ``` ### 正则表达式 正则的部分我们抽空讲解。先知道可以这样写就可以 ```python import re for tag in soup.find_all(re.compile("^b")): print(tag.name) # body # b ``` ### 列表 如果传入列表参数,Beautiful Soup会将与列表中任一元素匹配的内容返回 ```python print(soup.find_all(["a", "b"])) # [The Dormouse's story, Elsie, Lacie, Tillie] ``` ### True `True` 可以匹配任何值,下面代码查找到所有的tag,但是不会返回字符串节点 ```python for tag in soup.find_all(True): print(tag.name) # html # head # title # body # p # b # p # a # a # a # p ``` ### 方法(函数) 如果没有合适过滤器,那么还可以定义一个方法 ```python def has_class_but_no_id(tag): return tag.has_attr('class') and not tag.has_attr('id') soup.find_all(has_class_but_no_id) #[The Dormouse's story
,Once upon a time there were three #little sisters; and their names were #Elsie, #Lacie and #Tillie; #and they lived at the bottom of a well.
,...
] ``` find_all()内容比较多,小伙伴们可以先理解一下。后续我们接着来 码字不易,欢迎大家在评论区留言,收藏。或者加入群聊[群聊](https://jq.qq.com/?_wv=1027&k=vH00muGu)一起进步学习。