ElementTree类

 elementtree主要是一个包含根节点的树的文档包装器

它提供了序列化和一般文档处理的两种方法

 

from lxml import etree

str = '''<?xml version="1.0"?>
     <!DOCTYPE root SYSTEM "test" [ <!ENTITY tasty "parsnips"> ]>
     <root>
       <a>&tasty;</a>
    </root>
    '''
root = etree.XML(str)

tree = etree.ElementTree(root)
print(tree.docinfo.xml_version) #输出:1.0
print(tree.docinfo.doctype) #输出:<!DOCTYPE root SYSTEM "test">

tree.docinfo.public_id = '-//W3C//DTD XHTML 1.0 Transitional//EN'
tree.docinfo.system_url = 'file://local.dtd'

print(tree.docinfo.doctype)
'''输出:
<!DOCTYPE root PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "file://local.dtd">
'''

print(etree.tostring(tree))
'''输出:
<!DOCTYPE root PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "file://local.dtd" [
<!ENTITY tasty "parsnips">
]>
<root>
  <a>parsnips</a>
</root>
'''

print(etree.tostring(tree.getroot()))
'''输出:
<root>
  <a>parsnips</a>
</root>
'''

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

上一篇:爬虫多进程,etree和xpath


下一篇:python-爬虫基础-lxml.etree(3)-Elementtree类