数据处理中经常会有这样的情况,需要合并多个数据(按行或者按列合并),常规的merge或者rbind只能两个两个合并,操作繁琐。可以使用自写函数或do.call()函数进行数据库的拼接或合并,具体操作如下:
按列合并
mypath<-"C:/Users/18896/Desktop/example1"
multmerge = function(mypath){
filenames=list.files(path=mypath, pattern = ".XPT",full.names=TRUE)
datalist = lapply(filenames, function(x){read.xport(file=x)})
Reduce(function(x,y) {merge(x,y,by="SEQN",all=T)}, datalist)
}
mergedata<-multmerge(mypath)
mypath中为需要合并的所有文件夹的本地目录,定义函数multmerge,先列出需要合并的数据库名称,并读取为list,使用merge函数合并list中的数据框。最后生成的mergedata为合并之后的data
文件夹不在本地时
data1 <- data.frame(id = 1:6, # Create first example data frame
x1 = c(5, 1, 4, 9, 1, 2),
x2 = c("A", "Y", "G", "F", "G", "Y"))
data2 <- data.frame(id = 4:9, # Create second example data frame
y1 = c(3, 3, 4, 1, 2, 9),
y2 = c("a", "x", "a", "x", "a", "x"))
data3 <- data.frame(id = 5:6, # Create third example data frame
z1 = c(3, 2),
z2 = c("K", "b"))
data_list <- list(data1, data2, data3)
my_merge <- function(df1, df2){ # Create own merging function
merge(df1, df2, by = "id")
}
Reduce(my_merge, data_list)
#id x1 x2 y1 y2 z1 z2
#1 5 1 G 3 x 3 K
#2 6 2 Y 4 a 2 b
或者使用tidyverse包
install.packages("tidyverse") # Install tidyverse package
library("tidyverse")
data_list %>% reduce(inner_join, by = "id") # Apply reduce function of tidyverse
#id x1 x2 y1 y2 z1 z2
#1 5 1 G 3 x 3 K
#2 6 2 Y 4 a 2 b
按行合并
library(data.table)
DT1 = data.table(A=1:3,B=letters[1:3])
DT2 = data.table(B=letters[4:5],A=4:5)
DT3=data.table(A=6:7,B=letters[6:7])
l = list(DT1,DT2,DT3)
rbindlist(l, use.names=TRUE)
#A B
#1: 1 a
#2: 2 b
#3: 3 c
#4: 4 d
#5: 5 e
#6: 6 f
#7: 7 g
重复合并某个数据框多次
do.call("rbind", replicate(4, DT1, simplify = FALSE))
# A B
# 1: 1 a
# 2: 2 b
# 3: 3 c
# 4: 1 a
# 5: 2 b
# 6: 3 c
# 7: 1 a
# 8: 2 b
# 9: 3 c
#10: 1 a
#11: 2 b
#12: 3 c
Reference:R Merge Multiple Data Frames in List (2 Examples) | Base R vs. tidyverse (statisticsglobe.com)