最近在看《鲜活的数据:数据可视化指南》,学习一些数据可视化与数据分析的技术,本例是该书第一章的一个例子衍伸而来。
实例内容:从www.wunderground.com收集美国纽约州布法罗市(水牛城)2014年3月份每天最高气温,并导入Excel或WPS表格,制做成折线图。
工具准备:安装好的Python2.7,Beautiful Soup库(将其python文件放入Python库文件路径中)
步骤1:撰写Python程序。代码如下:
# -*- coding: cp936 -*-
import urllib2
from BeautifulSoup import BeautifulSoup
f = open(‘wunder-data.txt‘,‘w‘) #open the file
m = 3 #get weather data of March(3) 2014
for d in range(1,32): #loop from 2014.3.1 to 2014.3.31
timestamp = ‘2014‘ + str(m) + str(d)
print "Getting data for " + timestamp #for we can see the process in shell
url = "http://www.wunderground.com/history/airport/KBUF/2014/" + str(m) + "/" + str(d) + "/DailyHistory.html"
page = urllib2.urlopen(url) #get the web page
soup = BeautifulSoup(page) #use BeautifulSoup to parsing the web page
dayTemp = soup.findAll(attrs = {"class":"nobr"})[4].span.string #the data is showed in some HTML code where <class = "nobr">s are appeared
if len(str(m)) < 2: #format it
mStamp = ‘0‘ + str(m)
else:
mStamp = str(m)
if len(str(d)) < 2: #format it
dStamp = ‘0‘ + str(d)
else:
dStamp = str(d)
timestamp = ‘2014-‘ + mStamp + ‘-‘ + dStamp #make data look like 2014-03-01,which is convinient for excel or WPS to deal with
f.write(timestamp + ‘,‘ + dayTemp + ‘\n‘) #write it to the file
f.close() #close the file
步骤2:运行程序,得到数据文件wunder-data.txt。
步骤3:将数据导入WPS或Excel中,我用的是WPS表格:数据->导入数据->.....(这里就不贴图了)
步骤4:图表制作。
结果: