python – 使用Aiohttp与代理

我试图使用异步从URL列表中获取HTML(由ID标识).我需要使用代理.

我正在尝试使用aiohttp和下面的代理:

import asyncio
import aiohttp
from bs4 import BeautifulSoup

ids = ['1', '2', '3']

async def fetch(session, id):
    print('Starting {}'.format(id))
    url = f'https://www.testing.com/{id}'

    async with session.get(url) as response:
        return BeautifulSoup(await response.content, 'html.parser')

async def main(id):
    proxydict = {"http": 'xx.xx.x.xx:xxxx', "https": 'xx.xx.xxx.xx:xxxx'}
    async with aiohttp.ClientSession(proxy=proxydict) as session:
        soup = await fetch(session, id)
        if 'No record found' in soup.title.text:
            print(id, 'na')


loop = asyncio.get_event_loop()
future = [asyncio.ensure_future(main(id)) for id in ids]


loop.run_until_complete(asyncio.wait(future))

根据这里的一个问题:https://github.com/aio-libs/aiohttp/pull/2582似乎ClientSession(proxy = proxydict)应该工作.

但是,我收到一个错误“__init __()得到一个意外的关键字参数’proxy’”

知道我该怎么做才能解决这个问题吗?
谢谢.

解决方法:

您可以在session.get调用中设置代理配置:

async with session.get(url, proxy=your_proxy_url) as response:
    return BeautifulSoup(await response.content, 'html.parser')

如果您的代理需要身份验证,您可以将其设置在代理的网址中,如下所示:

proxy = 'http://your_user:your_password@your_proxy_url:your_proxy_port'
async with session.get(url, proxy=proxy) as response:
    return BeautifulSoup(await response.content, 'html.parser')

要么:

proxy = 'http://your_proxy_url:your_proxy_port'
proxy_auth = aiohttp.BasicAuth('your_user', 'your_password')
async with session.get(url, proxy=proxy, proxy_auth=proxy_auth) as response:
    return BeautifulSoup(await response.content, 'html.parser')

有关详细信息,请查看here

上一篇:python – aiohttp:限制并行请求的速率


下一篇:【Python】Python3网络爬虫实战-2、请求库安装:GeckoDriver、PhantomJS、Aiohttp