在SAS中,使用 SET 语句进行数据集纵向合并,用 MERGE 语句进行横向合并:
DATA new_dataset; SET dataset_1 dataset_n;
DATA new_dataset; MERGE dataset_1 dataset_n; BY variable_list;
纵向合并后,new_dataset 的行数等于每个数据集行数的加总。If one of the data sets has a variable not contained in the other data sets, then the observations from the other data sets will have missing values for that variable.
横向合并中的 by variable list 是所有数据集共同的变量。
一、纵向合并
例一:合并两个数据集 southentrance 和 northentrance, 合并后数据集观测值的顺序维持各自不变
DATA both; SET southentrance northentrance; IF Age = . THEN AmountPaid = .; ELSE IF Age < 3 THEN AmountPaid = 0; ELSE IF Age < 65 THEN AmountPaid = 35; ELSE AmountPaid = 27; PROC PRINT DATA = both; TITLE 'Both Entrances'; RUN;
例二:合并后数据集观测值按照 PassNumer 排序
DATA interleave; SET northentrance southentrance; BY PassNumber; PROC PRINT DATA = interleave; TITLE 'Both Entrances, By Pass Number'; RUN;
二、横向合并
例一:合并两个数据集 salesdata 和 descriptions,合并后的数据集包含两个数据集的所有观测值,相当于 full join
/*Merge之前必须先对两个数据集按照 By variables 排序*/
DATA chocolates; MERGE sales descriptions; BY CodeNum; PROC PRINT DATA = chocolates; TITLE ”Today's Chocolate Sales”; RUN;
合并后的数据集 chocolates 包含两个数据集的所有 observations, 如果某条 observation 在另外一个数据集中没有,则对应的variable展示为缺失值。
例二:一对多数据集合并,相当于 left join
DATA prices; MERGE shoes discount; BY ExerciseType; NewPrice = ROUND(RegularPrice - (RegularPrice * Adjustment), .01); PROC PRINT DATA = prices; TITLE ’Price List for May’; RUN;