import pandas as pd
a=pd.DataFrame({'name':['cindy','alice','ared'],'grade':[80,90,75],'age':[18,19,20]})
b=pd.DataFrame({'name':['cindy','alice','bob','ared'],'grade':[95,70,60,50]})
#表a的grade列计算极差
f=lambda x:x.max()-x.min()
print('a的grade列的极差:')
print(pd.DataFrame(a['grade']).apply(f,axis=0))
#默认按照索引相加,不匹配的索计算引结果为NaN
print('a+b:\n',a+b)
print('a与b的grade列相加:\n',pd.DataFrame(a['grade']+b['grade']))
print(a.loc[:,['grade']]+b.loc[:,['grade']])
#不同索引的相同字段值对应的指标相加:筛选并重置索引,然后计算
print('a的ared+b的ared:\n',pd.DataFrame(a.loc[a['name']=='ared',['grade']]).reset_index(drop=True)+pd.DataFrame(b.loc[b['name']=='ared',['grade']]).reset_index(drop=True))
#将计算结果的空值填充为0
print('将计算结果的空值填充为0:\n',(a+b).fillna(0))
#将不存在的索引对应的值填充为0,然后计算
print('将不存在的索引对应的值填充为0,然后计算:\n',pd.DataFrame(a['grade'].add(b['grade'],fill_value=0)))
runfile('C:/Users/xyy/.spyder-py3/temp.py', wdir='C:/Users/xyy/.spyder-py3')
a的grade列的极差:
grade 15
dtype: int64
a+b:
age grade name
0 NaN 175.0 cindycindy
1 NaN 160.0 alicealice
2 NaN 135.0 aredbob
3 NaN NaN NaN
a与b的grade列相加:
grade
0 175.0
1 160.0
2 135.0
3 NaN
grade
0 175.0
1 160.0
2 135.0
3 NaN
a的ared+b的ared:
grade
0 125
将计算结果的空值填充为0:
age grade name
0 0.0 175.0 cindycindy
1 0.0 160.0 alicealice
2 0.0 135.0 aredbob
3 0.0 0.0 0
将不存在的索引对应的值填充为0,然后计算:
grade
0 175.0
1 160.0
2 135.0
3 50.0
实例:计算两表中相同名称对应的指标和
import pandas as pd
a=pd.DataFrame({'name':['cindy','alice','ared'],'grade':[80,90,75],'age':[18,19,20]})
b=pd.DataFrame({'name':['cindy','alice','bob','ared'],'grade':[95,70,60,50]})
print('法1:按索引计算两表的指标列的和,然后合并两表,将合并结果与指标列的和再通过索引合并')
a_b=pd.merge(a,b,how='right',left_index=True,right_index=True,suffixes=('_a','_b')).fillna(0)
print('a右连接b:\n',a_b)
print('将不存在的索引对应的值填充为0,然后计算:\n',pd.merge(a_b,pd.DataFrame(a['grade'].add(b['grade'],fill_value=0)),left_index=True,right_index=True))
print('\n法2:直接合并表,并新增求和字段')
a_b['grade_sum']=a_b['grade_a']+a_b['grade_b']
print(a_b)
runfile('C:/Users/xyy/.spyder-py3/temp.py', wdir='C:/Users/xyy/.spyder-py3')
法1:按索引计算两表的指标列的和,然后合并两表,将合并结果与指标列的和再通过索引合并
a右连接b:
name_a grade_a age name_b grade_b
0 cindy 80.0 18.0 cindy 95
1 alice 90.0 19.0 alice 70
2 ared 75.0 20.0 bob 60
3 0 0.0 0.0 ared 50
将不存在的索引对应的值填充为0,然后计算:
name_a grade_a age name_b grade_b grade
0 cindy 80.0 18.0 cindy 95 175.0
1 alice 90.0 19.0 alice 70 160.0
2 ared 75.0 20.0 bob 60 135.0
3 0 0.0 0.0 ared 50 50.0
法2:直接合并表,并新增求和字段
name_a grade_a age name_b grade_b grade_sum
0 cindy 80.0 18.0 cindy 95 175.0
1 alice 90.0 19.0 alice 70 160.0
2 ared 75.0 20.0 bob 60 135.0
3 0 0.0 0.0 ared 50 50.0
实例:分类统计
import pandas as pd
import numpy as np
a=pd.DataFrame({'sex':['female','female','male','female'],'name':['cindy','alice','ared','cindy'],'grade':[80,90,75,70],'sale':[18,19,20,23]})
print('各人最大值:\n',a.groupby('name').max())
print('各性别各人最小值:\n',a.groupby(['sex','name']).min())
print('各性别各人销售总额:\n',a.groupby(['sex','name'])['sale'].sum())
print('各性别各人的指标个数、指标和、指标均值:\n',a.groupby(['sex','name'])['grade','sale'].agg([len,np.sum,np.mean]))
print('透视:\n',pd.pivot_table(a,index=['name'],columns=['sex'],values=['grade','sale'],aggfunc=[np.sum,np.mean],fill_value=0,margins=True))
print('透视总分、平均销售:\n',pd.pivot_table(a,index=['name'],columns=['sex'],values=['grade','sale'],aggfunc={'grade':np.sum,'sale':np.mean},fill_value=0,margins=True))
runfile('C:/Users/xyy/.spyder-py3/temp.py', wdir='C:/Users/xyy/.spyder-py3')
各人最大值:
sex grade sale
name
alice female 90 19
ared male 75 20
cindy female 80 23
各性别各人最小值:
grade sale
sex name
female alice 90 19
cindy 70 18
male ared 75 20
各性别各人销售总额:
sex name
female alice 19
cindy 41
male ared 20
Name: sale, dtype: int64
各性别各人的指标个数、指标和、指标均值:
grade sale
len sum mean len sum mean
sex name
female alice 1 90 90 1 19 19.0
cindy 2 150 75 2 41 20.5
male ared 1 75 75 1 20 20.0
透视:
sum mean
grade sale grade sale
sex female male All female male All female male All female male All
name
alice 90 0 90 19 0 19 90 0 90.0 19.0 0 19.0
ared 0 75 75 0 20 20 0 75 75.0 0.0 20 20.0
cindy 150 0 150 41 0 41 75 0 75.0 20.5 0 20.5
All 240 75 315 60 20 80 80 75 78.0 20.0 20 20.0
透视总分、平均分、平均销售:
grade sale
sex female male All female male All
name
alice 90 0 90 19.0 0 19.0
ared 0 75 75 0.0 20 20.0
cindy 150 0 150 20.5 0 20.5
All 240 75 315 20.0 20 20.0