爬虫白名单,在扫描的时候特别有用,伪造成爬虫,绕过检测。
自己写示例代码(有工具直接支持吗???):
我自己写的一个示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
#coding: utf-8
import requests
headers = {
#'User-Agent':"Mozilla/5.0 (compatible;Baiduspider-render/2.0; +http://www.baidu.com/search/spider.html)"
'User-Agent' : "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
}
domain = "http://XXX.com/"
with open ( "dicc.txt" ) as f:
for line in f:
path = line.strip()
url = domain + path
res = requests.get(url=url,headers=headers)
status = res.status_code
print( "url:{} status:{}" . format (url, status))
# print("response: ", res.text)
# break
|