尝试WebSocket握手验证反爬虫
参考
爬取
爬取网址:https://live.611.com/zq
根据审查元素,XHR
WS
发现data关联,取data值。
url='https://live.611.com/Live/GetToken'
res=requests.get(url).text
dict=json.loads(res)#str-->dict
data=dict['Data']
print(data)
在使用之前先安装非异步websocket,容易和异步websockets等混淆
pip install websocket-client
然后取出请求的str,补充请求的信息,尝试抓取20条
import json
import time
import requests
url='https://live.611.com/Live/GetToken'
res=requests.get(url).text
dict=json.loads(res)#str-->dict
data=dict['Data']
dict1={"command":"RegisterInfo","action":"Web","ids":[],
"UserInfo":{"Url":"live.611.com","Version":str([int(time.time()) * 1000])+"{\"chrome\":true,\"version\":\"86.0.4240.183\",\"webkit\":true}"}}
dict2={"command":"JoinGroup","action":"SoccerLiveOdd","ids":[]}
dict3={"command":"JoinGroup","action":"SoccerLive","ids":[]}
print(dict1)
json1=json.dumps(dict1)#dict-->json
json2=json.dumps(dict2)
json3=json.dumps(dict3)
url='wss://push.611.com:6119/{}'.format(data)
import websocket
ws=websocket.create_connection(url,timeout=10)
ws.send(json1)#返回int值
ws.send(json2)
ws.send(json3)
for i in range(20):
result=ws.recv()#只读一条
print(result)