3中方式任选一种即可
1、lua中脚本设置代理和请求头:
function main(splash, args) -- 设置代理 splash:on_request(function(request) request:set_proxy{ host = "27.0.0.1", port = 8000, } end) -- 设置请求头 splash:set_user_agent("Mozilla/5.0") -- 自定义请求头 splash:set_custom_headers({ ["Accept"] = "application/json, text/plain, */*" }) splash:go("https://www.baidu.com/") return splash:html()
2、scrapy中设置代理
def start_requests(self): for url in self.start_urls: yield SplashRequest(url, endpoint='execute', args={'wait': 5, 'lua_source': source, 'proxy': 'http://proxy_ip:proxy_port' }
scrapy中设置请求头一样的在headers中设置
3、中间件中设置代理
class ProxyMiddleware(object): def process_request(self, request, spider): request.meta['splash']['args']['proxy'] = proxyServer request.headers["Proxy-Authorization"] = proxyAuth
参考: