导入模块
import pandas as pd
1.groupby迭代操作
frame = pd.DataFrame({'color':['yellow','red','green','red','green'],
'object':['pen','pencil','pencil','ashtray','pen'],
'price1':[5.56,4.2,1.3,0.56,2.75],
'price2':[4.75,4.12,1.6,0.75,3.15]})
for name,group in frame.groupby('color'):
print(name)
print(group)
'''
green
color object price1 price2
2 green pencil 1.30 1.60
4 green pen 2.75 3.15
red
color object price1 price2
1 red pencil 4.20 4.12
3 red ashtray 0.56 0.75
yellow
color object price1 price2
0 yellow pen 5.56 4.75
'''
2.分组函数
frame = pd.DataFrame({'color':['yellow','red','green','red','green'],
'object':['pen','pencil','pencil','ashtray','pen'],
'price1':[5.56,4.2,1.3,0.56,2.75],
'price2':[4.75,4.12,1.6,0.75,3.15]})
group = frame.groupby('color')
group['price1'].quantile(0.6) #quantile()函数计算分位数
'''
color
green 2.170
red 2.744
yellow 5.560
'''
自定义聚合函数
定义好一个函数,将其作为参数传给agg()函数
def range(series):
return series.max()-series.min()
group['price1'].agg(range)
'''
color
green 1.45
red 3.64
yellow 0.00
'''
对整个DataFrame对象用agg()函数
def range(series):
return series.max()-series.min()
group.agg(range)
同时使用多个聚合函数
group['price1'].agg([range,'mean','std'])
参考:
法比奥·内利. Python数据分析实战:第2版.北京:人民邮电出版社, 2019.11.