在网页抓包过程中请求头是一个很重要的因素,很多反爬会在请求头上做手脚,少了请求头中某一个参数可能就会导致请求失败,所以最好的方式就是用原始抓包的请求头信息,可以降低我们出错的概率的同时还能节省时间,当然不排除有那种校验请求头顺序的网站,这种还要保证按原请求头顺序去写。
以谷歌浏览器为例(也可是其他浏览器或是抓包工具),如下图我们可直观看到当前请求的请求头信息,但是我们复制出来代码中是不能直接使用的还需要改写成key value的形式,通过如下代码可以快速提取我们需要的格式。
# -*- coding:utf-8 -*-
R_Headers = '''
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding: gzip, deflate, br
Accept-Language: zh,en;q=0.9,zh-CN;q=0.8
Cache-Control: max-age=0
Connection: keep-alive
Content-Length: 764
Content-Type: application/x-www-form-urlencoded
Cookie: cna=57a5dee1b49f425691b63ca3173dc471
Host: login.sina.com.cn
Origin: https://weibo.com
Referer: https://weibo.com/
sec-ch-ua: "Chromium";v="92", " Not A;Brand";v="99", "Google Chrome";v="92"
sec-ch-ua-mobile: ?0
Sec-Fetch-Dest: iframe
Sec-Fetch-Mode: navigate
Sec-Fetch-Site: cross-site
Sec-Fetch-User: ?1
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.159 Safari/537.36
'''
headers = R_Headers.split('\n')
def redirect(headers):
for h in headers:
if(h[0:1] == ':'):
single = h[1:].split(": ")
headers = f"'{single[0]}': '{single[1]}',"
print(headers)
else:
single = h.split(": ")
if(len(single) != 1):
headers = f"'{single[0]}': '{single[1]}',"
print(headers)
return headers
if __name__ == '__main__':
redirect(headers)
如图格式化后