matplotlib初学
首先思考一个问题:matplotlib只能绘制折线图么?
其实并不是的;
matplotlib能够绘制折线图,散点图,柱状图,直方图,箱线图,饼图等;
但是,我们需要知道不同的统计图到底能够表示出什么,以此来决定选择哪种统计图来更直观的呈现我们的数据。
我们用一个简单的实例来引出matplotlib
你获取到了2019年内地电影票房前20的电影(列表a)和电影票房数据(列表b),那么如何更加直观的展示该数据?
a =
[“哪吒之魔童降世”,“流浪地球”,“复仇者联盟4:终局之战”,“我和我的祖国”,“中国机长”,“疯狂的外星人”,“飞驰人生”,“烈火英雄”,“少年的你”,“速度与激情:特别行动”,“蜘蛛侠:英雄远征”,“扫毒2天地对决”,“误杀”,“叶问4”,“大黄蜂”,“攀登者”,“惊奇队长”,“比悲伤更悲伤的故事”,“哥斯拉2:怪兽之王”]
b=[49.34,46.18,42.05,31.46,28.84,21.83,17.03,16.76,15.32,14.18,14.01,12.85,11.97,11.72,11.38,10.88,10.25,9.46,9.27] 单位:亿
数据来源: http://58921.com/alltime/2019
绘制条形图
plt.bar(x,y,width)
将x轴,y轴和宽度的数值依次组合成条形区域。
from matplotlib import pyplot as plt
import numpy as np
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False
plt.figure(figsize=(20,10),dpi=80)
x = ["哪吒之魔童降世","流浪地球","复仇者联盟4:\n终局之战","我和我的祖国","中国机长","疯狂的外星人","飞驰人生","烈火英雄","少年的你","速度与激情:\n特别行动","蜘蛛侠:\n英雄远征","扫毒2天地对决","误杀","叶问4","大黄蜂","攀登者","惊奇队长","比悲伤更悲伤的故事","哥斯拉2:\n怪兽之王",]
y = [49.34,46.18,42.05,31.46,28.84,21.83,17.03,16.76,15.32,14.18,14.01,12.85,11.97,11.72,11.38,10.88,10.25,9.46,9.27]
plt.bar(x,y,0.8)
plt.xticks(x,fontsize=15,rotation=90)
plt.show()
我们再把它竖起来
from matplotlib import pyplot as plt
import numpy as np
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['axes.unicode_minus']=False
plt.figure(figsize=(20,10),dpi=80)
x = ["哪吒之魔童降世","流浪地球","复仇者联盟4:\n终局之战","我和我的祖国","中国机长","疯狂的外星人","飞驰人生","烈火英雄","少年的你","速度与激情:\n特别行动","蜘蛛侠:\n英雄远征","扫毒2天地对决","误杀","叶问4","大黄蜂","攀登者","惊奇队长","比悲伤更悲伤的故事","哥斯拉2:\n怪兽之王",]
y = [49.34,46.18,42.05,31.46,28.84,21.83,17.03,16.76,15.32,14.18,14.01,12.85,11.97,11.72,11.38,10.88,10.25,9.46,9.27]
plt.barh(x,y,0.4)
plt.grid(alpha=0.3)
plt.show()
然后进阶一点
我们知道了列表a中电影分别在2020-09-11(y_11), 2020-09-12(y_12), 2020-09-13(y_13)三天的票房,为了展示列表中电影本身的票房以及同其他电影的数据对比情况,应该如何更加直观的呈现该数据?
a = [“花木兰”,“八佰”,“信条”,“我的女友是机器人”]
y_11 = [5394,3736,1411,1088]
y_12 = [6212,5997,3158,912]
y_13 = [4079,4816,2466,718]
数据来源: https://www.endata.com.cn/BoxOffice/BO/Day/index.html
a = ["花木兰","八佰","信条","我的女友是机器人"]
y_11 = [5394,3736,1411,1088]
y_12 = [6212,5997,3158,912]
y_13 = [4079,4816,2466,718]
plt.figure(figsize=(15,8),dpi=160)
bar_width = 0.2
x_12=range(len(a))
x_11=[i-bar_width for i in x_12]
x_13=[i+bar_width for i in x_12]
plt.bar(x_11,y_12,width = bar_width)
plt.bar(a,y_13,width = bar_width)
plt.bar(x_13,y_11,width = bar_width)
plt.show()
条形图的应用场景
数量统计
频率统计(市场饱和度)
继续加大难度
你获取了250部电影的时长(列表a中),希望统计出这些电影时长的分布状态(比如时长为100分钟到120分钟电影的数量,出现的频率)等信息,你应该如何呈现这些数据?
a=
[131, 98, 125, 131, 124, 139, 131, 117, 128, 108, 135, 138, 131, 102, 107, 114, 119, 128, 121, 142, 127, 130, 124, 101, 110, 116, 117, 110, 128, 128, 115, 99, 136, 126, 134, 95, 138, 117, 111,78, 132, 124, 113, 150, 110, 117, 86, 95, 144, 105, 126, 130,126, 130, 126, 116, 123, 106, 112, 138, 123, 86, 101, 99, 136,123, 117, 119, 105, 137, 123, 128, 125, 104, 109, 134, 125, 127,105, 120, 107, 129, 116, 108, 132, 103, 136, 118, 102, 120, 114,105, 115, 132, 145, 119, 121, 112, 139, 125, 138, 109, 132, 134,156, 106, 117, 127, 144, 139, 139, 119, 140, 83, 110, 102,123,107, 143, 115, 136, 118, 139, 123, 112, 118, 125, 109, 119, 133,112, 114, 122, 109, 106, 123, 116, 131, 127, 115, 118, 112, 135,115, 146, 137, 116, 103, 144, 83, 123, 111, 110, 111, 100, 154,136, 100, 118, 119, 133, 134, 106, 129, 126, 110, 111, 109, 141,120, 117, 106, 149, 122, 122, 110, 118, 127, 121, 114, 125, 126,114, 140, 103, 130, 141, 117, 106, 114, 121, 114, 133, 137, 92,121, 112, 146, 97, 137, 105, 98, 117, 112, 81, 97, 139, 113,134, 106, 144, 110, 137, 137, 111, 104, 117, 100, 111, 101, 110,105, 129, 137, 112, 120, 113, 133, 112, 83, 94, 146, 133, 101,131, 116, 111, 84, 137, 115, 122, 106, 144, 109, 123, 116, 111,111, 133, 150]
a = [131, 98, 125, 131, 124, 139, 131, 117, 128, 108, 135, 138, 131, 102, 107, 114, 119, 128, 121, 142, 127, 130, 124, 101, 110, 116, 117, 110, 128, 128, 115, 99, 136, 126, 134, 95, 138, 117, 111,78, 132, 124, 113, 150, 110, 117, 86, 95, 144, 105, 126, 130,126, 130, 126, 116, 123, 106, 112, 138, 123, 86, 101, 99, 136,123, 117, 119, 105, 137, 123, 128, 125, 104, 109, 134, 125, 127,105, 120, 107, 129, 116, 108, 132, 103, 136, 118, 102, 120, 114,105, 115, 132, 145, 119, 121, 112, 139, 125, 138, 109, 132, 134,156, 106, 117, 127, 144, 139, 139, 119, 140, 83, 110, 102,123,107, 143, 115, 136, 118, 139, 123, 112, 118, 125, 109, 119, 133,112, 114, 122, 109, 106, 123, 116, 131, 127, 115, 118, 112, 135,115, 146, 137, 116, 103, 144, 83, 123, 111, 110, 111, 100, 154,136, 100, 118, 119, 133, 134, 106, 129, 126, 110, 111, 109, 141,120, 117, 106, 149, 122, 122, 110, 118, 127, 121, 114, 125, 126,114, 140, 103, 130, 141, 117, 106, 114, 121, 114, 133, 137, 92,121, 112, 146, 97, 137, 105, 98, 117, 112, 81, 97, 139, 113,134, 106, 144, 110, 137, 137, 111, 104, 117, 100, 111, 101, 110,105, 129, 137, 112, 120, 113, 133, 112, 83, 94, 146, 133, 101,131, 116, 111, 84, 137, 115, 122, 106, 144, 109, 123, 116, 111,111, 133, 150]
plt.figure(figsize=(20,10),dpi=160)
plt.hist(a,78,width=0.4)
plt.xticks(a,rotation=90)
plt.show()
绘制直方图
plt.hist(data, divide)
将所有数据按照分组的数量平均分配好。
a = [131, 98, 125, 131, 124, 139, 131, 117, 128, 108, 135, 138, 131, 102, 107, 114, 119, 128, 121, 142, 127, 130, 124, 101, 110, 116, 117, 110, 128, 128, 115, 99, 136, 126, 134, 95, 138, 117, 111,78, 132, 124, 113, 150, 110, 117, 86, 95, 144, 105, 126, 130,126, 130, 126, 116, 123, 106, 112, 138, 123, 86, 101, 99, 136,123, 117, 119, 105, 137, 123, 128, 125, 104, 109, 134, 125, 127,105, 120, 107, 129, 116, 108, 132, 103, 136, 118, 102, 120, 114,105, 115, 132, 145, 119, 121, 112, 139, 125, 138, 109, 132, 134,156, 106, 117, 127, 144, 139, 139, 119, 140, 83, 110, 102,123,107, 143, 115, 136, 118, 139, 123, 112, 118, 125, 109, 119, 133,112, 114, 122, 109, 106, 123, 116, 131, 127, 115, 118, 112, 135,115, 146, 137, 116, 103, 144, 83, 123, 111, 110, 111, 100, 154,136, 100, 118, 119, 133, 134, 106, 129, 126, 110, 111, 109, 141,120, 117, 106, 149, 122, 122, 110, 118, 127, 121, 114, 125, 126,114, 140, 103, 130, 141, 117, 106, 114, 121, 114, 133, 137, 92,121, 112, 146, 97, 137, 105, 98, 117, 112, 81, 97, 139, 113,134, 106, 144, 110, 137, 137, 111, 104, 117, 100, 111, 101, 110,105, 129, 137, 112, 120, 113, 133, 112, 83, 94, 146, 133, 101,131, 116, 111, 84, 137, 115, 122, 106, 144, 109, 123, 116, 111,111, 133, 150]
d = 3
plt.figure(figsize=(20,8),dpi=80)
plt.hist(a,26)
plt.xticks(range(min(a),max(a)+d,d))
plt.grid()
plt.show()
plt.hist(data, divide)
将所有数据按照分组的数量平均分配好。
组数要适当,太少会有较大的统计误差,太多规律不明显。
组数 = 极差/组距
问题来了
预估2020年我国各收入阶层家庭户数(百万户)情况如下所示,这些数据能够绘制成直方图吗?
income = [0,5200,8300,12500,24000]
width = [5200,3100,4200,11500]
quantity = [73.1,85.9,81.3,76.8,21.9]
income = [0,5200,8300,12500,24000]
width = [5200,3100,4200,11500]
quantity = [73.1,85.9,81.3,76.8,21.9]
plt.figure(figsize=(10,8),dpi=80)
plt.bar(range(len(income)),quantity,width=1)
x=[i-0.5 for i in range(len(income)+1)]
x_label = income+["∞"]
plt.xticks(x,x_label)
for x,y in enumerate(quantity):
plt.text(x-0.1,y+2,"%s"%y) #"%s"%y是格式化
plt.grid()
plt.show()
直方图的应用场景
使用plt.hist方法的的是那些没有统计过的数据
用户的年龄分布状态
一段时间内用户点击次数的分布状态
用户活跃时间的分布状态
绘制散点图
通过爬虫你获取到了重庆2019年3,10月份每天白天的最高气温(分别位于列表a,b),那么此时如何寻找出气温和随时间(天)变化的某种规律?
a = [15,11,18,12,14,13,13,12,15,20,22,17,15,13,15,18, 17,22,25,26,22,12,11,12,20,21,20,25,30,18,15]
b = [28,28,29,27,24,29,23,23,26,23,21,23,23,21,21,20, 22,24,25,21,19,18,23,20,15,14,16,20,23,22,23]
数据来源: http://lishi.tianqi.com/chongqing/index.html
y_1 = [15,11,18,12,14,13,13,12,15,20,22,17,15,13,15,18,17,22,25,26,22,12,11,12,20,21,20,25,30,18,15]
y_2 = [28,28,29,27,24,29,23,23,26,23,21,23,23,21,21,20,22,24,25,21,19,18,23,20,15,14,16,20,23,22,23]
x_1 = range(1,32)
plt.figure(figsize=(15,8),dpi=80)
plt.scatter(x_1,y_1)
plt.show()
plt.scatter(x,y)
将x轴和y轴的数值依次组合成坐标点。
如何展示数据
折线图:以折线的上升或下降来表示统计数量的增减变化的统计图
特点:能够显示数据的变化趋势,反映事物的变化情况。(变化)
直方图:由一系列高度不等的纵向条纹或线段表示数据分布的情况。一般用横轴表示数据范围,纵轴表示分布情况。
特点:绘制连续性的数据,展示一组或者多组数据的分布状况(统计)
相同特点的还有饼图
条形图:排列在工作表的列或行中的数据可以绘制到条形图中。
特点:绘制连离散的数据,能够一眼看出各个数据的大小,比较数据之间的差别。(统计)
散点图:用两组数据构成多个坐标点,考察坐标点的分布,判断两变量之间是否存在某种关联或总结坐标点的分布模式。
特点:判断变量之间是否存在数量关联趋势,展示离群点(分布规律)
matplotlib使用的流程总结
明确问题
选择图形的呈现方式
准备数据
绘图和图形完善
matplotlib更多的图形样式
matplotlib支持的图形是非常多的,如果有其他的需求,我们可以查看一下url地址:http://matplotlib.org/gallery/index.html