python之爬取某网站图片附带源码,求精

先上效果:

开始步骤:

1.百度找我喜欢的图片,太多了,慢慢来,哦哦哦——————-。终于黄天不负有心人,

2.找到了:

3.开搞:起来

(1)伪装浏览器(俺用的FIDDLER抓包,模拟谷歌吧)

def hander_request1(url, page, i):

    url = url + str(i) + '.html'
   headers = {

        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.93 Safari/537.36',


    }

    request = urllib.request.Request(url, headers=headers)

    return request


    # print(url)
   pass

2.正则拆分表单(这里写的复杂些)

part = re.compile(r'python之爬取某网站图片附带源码,求精')

lt = part.findall(cont)

dirname = '美女'

# urllib.request.urlretrieve(str(lt), filepath)
print(lt)

url1 = str(lt).split('"')[1]

print(url1)

f1 = str(lt).split('"')[-2]

filename = f1

print(filename + ' 开始下载')

filepath = dirname + '/' + filename + '.jpg'
if not os.path.exists(dirname):

    os.mkdir(dirname)

# nt=mt.split()[0]


3.保存文件路径和名称

requset1 = urllib.request.Request(url=url1, headers=hd)

response1 = urllib.request.urlopen(requset1)

# urllib.request.urlretrieve(url1, filepath)
wenjianming = filename + '.jpg'
with open(wenjianming, 'wb') as fp:

    fp.write(response1.read())


# print(mt+'下载完成')
print(filename + ' 完成下载')


4.俺的图片都是分类的,套图得明白??

写了两个循环

def main():

    url = 'http://www.kantuba.net/guonei/'
   start_page = int(input('输入开始页码:'))

    end_page = int(input('请输入结束页码:'))

    #i网页计数器,可以用
   i = 0
   page = 0
   if start_page == 1:

        for i in range(10000, 10020):

            request = hander_request1(url, page, i)

            cont = urllib.request.urlopen(request).read().decode()

            download_image(cont)

            for page in range(start_page + 1, end_page):

                request = hander_request(url, page, i)

                cont = urllib.request.urlopen(request).read().decode()

                download_image(cont)

                # wenjianming = str(i) + str(page) + '.html'
               # with open(wenjianming, 'wb') as fp:
               #     fp.write(download_image(cont))
               #     # time.sleep(1)
               #     print(wenjianming + 'OK!')


   elif start_page != 1:

        for i in range(10000, 10020):

            for page in range(start_page, end_page):

                request = hander_request(url, page, i)

                cont = urllib.request.urlopen(request).read().decode()

                download_image(cont)


    pass


5.亲测效果杠杠的,拿走即可。拿回去只需要改正则表达式和URL。即可,被窝里看别忘了感谢我哦哦,啧啧啧!拿走不谢!

上一篇:云原生的未来的机遇和挑战


下一篇:(学习笔记)小技巧之Jupyter notebook之无法跳转谷歌浏览器