Python处理HTML实体编码
python2
import HTMLParser char = r"〹" http_parser = HTMLParser.HTMLParser(); uChar = http_parser.unescape(char);
python3
from html import unescape s = u'position.php?&start=10#a" id="next">下一页</a>' print(s) print(unescape(s)) """ position.php?&start=10#a" id="next">下一页</a> position.php?&start=10#a" id="next">下一页</a> """
参考: Python处理HTML实体编码