Python爬虫:处理html实体编码

Python处理HTML实体编码

python2

import HTMLParser  

char = r"〹"  
http_parser = HTMLParser.HTMLParser();  
uChar = http_parser.unescape(char);  

python3

from html import unescape

s = u'position.php?&amp;start=10#a" id="next">下一页</a>'

print(s)

print(unescape(s))

"""
position.php?&amp;start=10#a" id="next">下一页</a>
position.php?&start=10#a" id="next">下一页</a> 
"""

参考: Python处理HTML实体编码

上一篇:xttstartupnomount.sql


下一篇:xttprep.tmpl