python3 Connection aborted.', RemoteDisconnected('Remote end closed connection without res

原文链接:https://blog.csdn.net/goodnameused/article/details/80246331

终于解决爬虫怕网站时被拒的方法了,感谢博主分享

 

在写爬虫的时候遇到了问题,网站是asp.net写的

1 requests.exceptions.ConnectionError: 
2 ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

于是就抓包分析,发现只要加了’Accept-Language’就好了。。。

 1 'Accept-Language':'zh-CN,zh;q=0.9' 

 

 

如何使用涉及代码:第16、17、22、35行

 1 import os
 2 import urllib.request
 3 import json
 4 from urllib import request
 5 
 6 import pytest
 7 import requests
 8 from bs4 import BeautifulSoup
 9 import ssl
10 ssl._create_default_https_context = ssl._create_unverified_context
11 #写入User-Agent,采用字典形式
12 # head={}
13 # head['User-Agent']='Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
14 
15 
16 headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) '  
17                         'Chrome/51.0.2704.63 Safari/537.36', 'Accept-Language':'zh-CN,zh;q=0.9'}
18 #根据学名搜索页面
19 def apiSerch(xueming):
20     apiUrl = "https://api.ebird.org/v2/ref/taxon/find?locale=zh_CN&cat=species&limit=150&key=jfekjedvescr&q="+xueming
21     # 调用api 匹配学名
22     resp = requests.get(apiUrl, headers=headers)
23     print(resp.text)
24 
25     # resp = '[{"code": "apubrf1", "name": "Apurimac Brushfinch - Atlapetes forbesi"}]'
26     data = json.loads(resp.text)
27     # print(data2['code'],data2['name'])
28     #如果未查询到则返回空串
29     if (len(data) == 0):
30         return ''
31     print(data[0]['code'])
32     # 根据学名的code值搜索 获取目标页面
33     searchUrl = "https://ebird.org/species/"
34     # 创建headers伪装浏览器
35     shtml = requests.get(searchUrl + data[0]['code'] + "/", headers=headers)
36     #print(shtml.text)
37     return shtml

 

上一篇:RouterOS S 之带宽管理 及 QOS- 简单队列Simple queue


下一篇:Vue使用formdata向后台传递参数