轻轻学爬虫—scrapy框架巧用7—猴子偷桃(3)

# 轻轻学爬虫—scrapy框架巧用7—猴子偷桃(3) 上节课我们讲解了bs4的一部分使用方法,今天我们来继续学习。我们还是以上节课的数据为例子 ```python html_doc = """ The Dormouse's story <body>

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie, Lacie and Tillie; and they lived at the bottom of a well.

...

""" from bs4 import BeautifulSoup soup = BeautifulSoup(html_doc, 'html.parser') soup.prettify() print(soup) #得到下面结构化的html ""“ The Dormouse's story <body>

The Dormouse's story

Once upon a time there were three little sisters; and their names were Elsie , Lacie and Tillie ; and they lived at the bottom of a well.

...

""" ``` ## .parent 通过 `.parent` 属性来获取某个元素的父节点 ```python b_tag = soup.b print(b_tag.parent) #打印输出

The Dormouse's story

``` ## .parents 通过元素的 `.parents` 属性可以递归得到元素的所有父辈节点 ```python b_tag = soup.b for parent in b_tag.parents: if parent is None: print(parent) else: print(parent.name) # 打印输出 p body html [document] ``` ## .next_sibling 和 .previous_sibling 兄弟标签,指的两个便签同级别,比如样例结构中,有好多个p标签,他们都是兄弟便签。 ``` sibling_soup = BeautifulSoup("text1text2
上一篇:马云也谈996,对开发者真的是一福利吗?


下一篇:Day16 Scanner进阶使用