之前一直在寻找比较内容差异的库,原来python标准库里自带有difflib库
这就比较有意思了,对于数据采集来说比较两次请求参数的变化就很有用了,可以知道哪些是变化的,方便定位比较
import difflib def diff_headers(): text1 ='''Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 Accept-Encoding: gzip, deflate Accept-Language: q=0.9,en;q=0.8,en-US;q=0.7;zh-CN,zh; Cache-Control: no-cache Connection: keep-alive Cookie: UM_distinctid=17c5f7e8e37f8b-030342123ea219-513c1743-15f900-17c2f7e8e38463; CNZZDATA1586682=cnzz_eid%3D1569740215-1636510718-null%26ntime%3D1642568049; PHPSESSID=l5otho4quql6jpf7majg5795fs; _stat_uid=05967439303530977045856681345587735 Host: www.chem365.net Pragma: no-cache Referer: http://www.chem365.net/ Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36 Edg/98.0.1108.50'''.splitlines(keepends=True) text2 = ''' Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9 Accept-Encoding: gzip, deflate Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,en-US;q=0.7 Cache-Control: no-cache Connection: keep-alive Cookie: UM_distinctid=17c2f7e8e37f8b-030342123ea219-513c1743-15f900-17c2f7e8e38463; CNZZDATA1586682=cnzz_eid%3D1569740215-1636510718-null%26ntime%3D1642568049; PHPSESSID=l5otho4quql6jpf7majg5795fs; _stat_uid=05967439303530977045856681345587735 Host: www.chem365.net Pragma: no-cache Referer: http://www.chem365.net/web/index/information/classid/142.html Upgrade-Insecure-Requests: 1 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.80 Safari/537.36 Edg/98.0.1108.50 '''.splitlines(keepends=True) d = difflib.HtmlDiff() htmlContent = d.make_file(text1,text2) # print(htmlContent) with open('diff_header.html','w') as f: f.write(htmlContent) if __name__ == '__main__': # diff_html() diff_headers()
如图是根据生成的html可以清晰的看到内容的变动(不同的颜色代表不同的动作),这样做比较久很容易看出来了
更详细的内容可以参考: https://blog.csdn.net/weixin_45775963/article/details/104122753