以下是小米应用商店热门APP的爬虫代码:
只爬取前十页:
# coding=utf-8 import requests import re from bs4 import BeautifulSoup count=1 #爬取小米应用市场前十页 while count<11: # 获取排行榜页面的网页内容 wbdata = requests.get("http://app.mi.com/topList?page=" + str(count)).text print("开始爬取第" + str(count) + "页") soup = BeautifulSoup(wbdata,‘lxml‘) applist = soup.find(class_=‘applist‘) for li in applist.find_all(name=‘li‘): #print(‘输出每个li:‘, li) pkg_name = li.a[‘href‘] appname = li.h5.string categroy = li.p.string print(appname+‘|‘+pkg_name+‘|‘+categroy) count += 1
结果:
开始爬取第1页 王者荣耀|/details?id=com.tencent.tmgp.sgame|网游RPG QQ|/details?id=com.tencent.mobileqq|聊天社交 抖音短视频|/details?id=com.ss.android.ugc.aweme|影音视听 微信|/details?id=com.tencent.mm|聊天社交 快手|/details?id=com.smile.gifmaker|摄影摄像 …………(以后省略一万字)