为了简化我的问题,我创建了一个小型DataFrame,如下所示:
Type From To
A "H1" "U1"
A "H9" "I8"
A "H1" "IL"
B "P2" "P8"
B "P2" "P7"
C "P9" "O8"
C "P9" "I0"
C "P7" "O8"
在对字符串进行分组和汇编之后,我们应该得到以下期望的结果:
Type From To
A "H1" "U1, IL"
A "H9" "I8"
B "P2" "P8, P7"
C "P9" "O8, I0"
C "P7" "O8"
我使用拆分和聚合函数做了它.对于任何想法或建议如何使用Python,我将非常感谢!
解决方法:
在R中,我们可以通过粘贴做一组. (注意,首先发布的问题中有一个R标签.否则,我们甚至不会尝试这个R解决方案)
library(tidyverse)
df1 %>%
group_by(Type, From) %>%
summarise(To = toString(To))
# A tibble: 5 x 3
# Groups: Type [?]
# Type From To
# <chr> <chr> <chr>
#1 A H1 U1, IL
#2 A H9 I8
#3 B P2 P8, P7
#4 C P7 O8
#5 C P9 O8, I0
数据
df1 <- structure(list(Type = c("A", "A", "A", "B", "B", "C", "C", "C"
), From = c("H1", "H9", "H1", "P2", "P2", "P9", "P9", "P7"),
To = c("U1", "I8", "IL", "P8", "P7", "O8", "I0", "O8")),
class = "data.frame", row.names = c(NA,
-8L))
在python中,我们可以做到
out = df2.groupby(['Type', 'From'])['To'].apply(lambda x: ','.join(x)).reset_index()
print(out)
# Type From To
#0 A H1 U1,IL
#1 A H9 I8
#2 B P2 P8,P7
#3 C P7 O8
#4 C P9 O8,I0
数据
import pandas as pd
df2 = pd.DataFrame({'Type': ['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'], \
'From': ['H1', 'H9', 'H1', 'P2', 'P2', 'P9', 'P9', 'P7'], \
'To': ['U1', 'I8', 'IL', 'P8', 'P7', 'O8', 'I0', 'O8']})