利用BeautifulSoup爬去我爱我家的租房数据

因为之前对BeautifulSoup一直不是很熟悉,刚好身边的朋友同事在找房子,就想着能不能自己写个爬虫爬一下数据,因此就写了这个爬虫。基本都是边看书边写的,不过也没什么好讲的。直接粘代码了。

# coding=utf-8
import requests
from bs4 import BeautifulSoup
import pymysql
import time
db= pymysql.connect(host="127.0.0.1",port =3306,user="root" ,passwd="root",db="woaiwojia",charset='utf8')
cursor = db.cursor()
for num in range(1,81):
url = "https://sh.5i5j.com/zufang/o8r1u1n"+str(num)+"/"
time.sleep(10)
strhtml = requests.get(url)
fanlist = BeautifulSoup(strhtml.text,"lxml")
sthtml = fanlist.find_all("ul",{"class":"pList"})
for ul in fanlist.find_all("ul",{"class":"pList"}):
for li in ul.find_all(name="li"):
for div in li.find_all("div",{"class":"listCon"}):
xiaoqu = div.h3.a.string
detailUrl = "https://sh.5i5j.com"+div.h3.a.attrs['href']
detailhtml = requests.get(detailUrl)
detail = BeautifulSoup(detailhtml.text,"lxml")
jinjirenlist =detail.find_all("div",{"id":"housebroker"})
for div1 in div.find_all("div",{"class":"listX"}):
area = div1.find_all("p")[0].text
community = div1.find_all("p")[1].text
hot = div1.find_all("p")[2].text
price = div1.find_all("div",{"class":"jia"})[0].p.strong.string
for uldiv in detail.find_all("div",{"id":"housebroker"}):
for ul in uldiv.find_all("ul"):
lxrphone = ul.h3.string+ul.label.string
sql = "insert into zufang(area,xiaoqu,community,hot,price,lxrphone) VALUES ('%s','%s','%s','%s','%s','%s');" % (area, xiaoqu,community,hot,price,lxrphone)
try:
cursor.execute(sql)
db.commit()
except:
print('插入失败')

有什么问题或者建议可以评论与我进行交流

上一篇:利用Pandas和matplotlib分析我爱我家房租区间频率


下一篇:什么是 Native、Web App、Hybrid、React Native 和 Weex?(转载)