似乎有类似的问题,但我找不到合适的答案.让我们说这是我的数据框架,对不同品牌的汽车有不同的观察结果:
df = pandas.DataFrame({'Car' : ['BMW_1', 'BMW_2', 'BMW_3', 'WW_1','WW_2','Fiat_1', 'Fiat_2'],
'distance' : [10,25,22,24,37,33,49]})
为简单起见,我们假设我有一个函数将第一个元素乘以2,将第二个元素乘以三:
def my_func(x,y):
z = 2x + 3y
return z
我希望得到汽车所覆盖的距离的成对组合,并在my_func中使用它们.但有两个条件是x和y不能是相同的品牌和组合不应该重复.期望的输出是这样的:
Car Distance Combinations
0 BMW_1 10 (BMW_1,WW_1),(BMW_1,WW_2),(BMW_1,Fiat_1),(BMW_1,Fiat_1)
1 BMW_2 25 (BMW_2,WW_1),(BMW_2,WW_2),(BMW_2,Fiat_1),(BMW_2,Fiat_1)
2 BMW_3 22 (BMW_3,WW_1),(BMW_3,WW_2),(BMW_3,Fiat_1),(BMW_3,Fiat_1)
3 WW_1 24 (WW_1, Fiat_1),(WW_1, Fiat_2)
4 WW_2 37 (WW_2, Fiat_1),(WW_2, Fiat_2)
5 Fiat_1 33 None
6 Fiat_2 49 None
//Output
[120, 134, 156, 178]
[113, 145, 134, 132]
[114, 123, 145, 182]
[153, 123]
[120, 134]
None
None
注意:我编了输出数字.
下一步我想从每个品牌的“输出”行数组中获取最大数量.最终的数据应该是这样的
Car Max_Distance
0 BMW 178
1 WW 153
2 Fiat None
如果有人能帮助我,我将不胜感激
解决方法:
更新:
In [49]: x = pd.DataFrame(np.triu(squareform(pdist(df[['distance']], my_func))),
...: columns=df.Car.str.split('_').str[0],
...: index=df.Car.str.split('_').str[0]).replace(0, np.nan)
...:
In [50]: x[x.apply(lambda col: col.index != col.name)].max(1).max(level=0)
Out[50]:
Car
BMW 197.0
Fiat NaN
WW 221.0
dtype: float64
老答案:
IIUC您可以执行以下操作:
from scipy.spatial.distance import pdist, squareform
def my_func(x,y):
return 2*x + 3*y
x = pd.DataFrame(
squareform(pdist(df[['distance']], my_func)),
columns=df.Car.str.split('_').str[0],
index=df.Car.str.split('_').str[0])
它产生了:
In [269]: x
Out[269]:
Car BMW BMW BMW WW WW Fiat Fiat
Car
BMW 0.0 95.0 86.0 92.0 131.0 119.0 167.0
BMW 95.0 0.0 116.0 122.0 161.0 149.0 197.0
BMW 86.0 116.0 0.0 116.0 155.0 143.0 191.0
WW 92.0 122.0 116.0 0.0 159.0 147.0 195.0
WW 131.0 161.0 155.0 159.0 0.0 173.0 221.0
Fiat 119.0 149.0 143.0 147.0 173.0 0.0 213.0
Fiat 167.0 197.0 191.0 195.0 221.0 213.0 0.0
排除同一品牌:
In [270]: x.apply(lambda col: col.index != col.name)
Out[270]:
Car BMW BMW BMW WW WW Fiat Fiat
Car
BMW False False False True True True True
BMW False False False True True True True
BMW False False False True True True True
WW True True True False False True True
WW True True True False False True True
Fiat True True True True True False False
Fiat True True True True True False False
In [273]: x[x.apply(lambda col: col.index != col.name)]
Out[273]:
Car BMW BMW BMW WW WW Fiat Fiat
Car
BMW NaN NaN NaN 92.0 131.0 119.0 167.0
BMW NaN NaN NaN 122.0 161.0 149.0 197.0
BMW NaN NaN NaN 116.0 155.0 143.0 191.0
WW 92.0 122.0 116.0 NaN NaN 147.0 195.0
WW 131.0 161.0 155.0 NaN NaN 173.0 221.0
Fiat 119.0 149.0 143.0 147.0 173.0 NaN NaN
Fiat 167.0 197.0 191.0 195.0 221.0 NaN NaN
选择每行最大值:
In [271]: x[x.apply(lambda col: col.index != col.name)].max(1)
Out[271]:
Car
BMW 167.0
BMW 197.0
BMW 191.0
WW 195.0
WW 221.0
Fiat 173.0
Fiat 221.0
dtype: float64
每个品牌最高价:
In [276]: x[x.apply(lambda col: col.index != col.name)].max(1).max(level=0)
Out[276]:
Car
BMW 197.0
Fiat 221.0
WW 221.0
dtype: float64