(venv) D:\pytest>pip install beautifulsoup Collecting beautifulsoup Using cached https://files.pythonhosted.org/packages/1e/ee/295988deca1a5a7accd783d0dfe14524867e31abb05b6c0eeceee49c759d/BeautifulSoup-3.2.1.tar.gz Complete output from command python setup.py egg_info: Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\Users\1\AppData\Local\Temp\pip-install-mav7d0bo\beautifulsoup\setup.py", line 22 print "Unit tests have failed!" ^ SyntaxError: Missing parentheses in call to 'print'. Did you mean print("Unit tests have failed!")? ---------------------------------------- Command "python setup.py egg_info" failed with error code 1 in C:\Users\1\AppData\Local\Temp\pip-install-mav7d0bo\beautifulsoup\
哦,大概是beautifulsoup已经被炸了,需要pip install beautifulsoup4 或者直接bs4
BeautifulSoup类的基本元素
基本元素 | 说明 |
Tag | 标签,最基本的信息组织单元,分别用<> 和</>标明开头和结尾 |
Name | 标签的名字,<p>…</p>的名字是'p',格式:<tag>.name |
Attributes | 标签的属性,字典形式组织,格式:<tag>.attrs |
NavigableString | 标签内非属性字符串,<>…</>中字符串,格式:<tag>.string |
Comment | 标签内字符串的注释部分,一种特殊的Comment类型 |
标签树的下行遍历
属性 | 说明 |
.contents | 子节点的列表,将<tag>所有儿子节点存入列表 |
.children | 子节点的迭代类型,与.contents类似,用于循环遍历儿子节点 |
.descendants | 子孙节点的迭代类型,包含所有子孙节点,用于循环遍历 |