Repeated和Random在GLM和Mixed模型中都有涉及。参考随机效应VS固定效应。
只要没加Repeated 或 Random的都是固定效应。
1. Repeated in GLM
the REPEATED statement enables you to test hypotheses about the measurement factors (often called within-subject factors) as well as the interactions of within-subjectfactors with independent variables in the MODEL statement (often called between-subject factors).
- REPEATED语句就是检验within-subject factors和interactions of within-subjectfactors with independent variables的effect。
between-subject effects on the MODEL statement and the main within-subject effect on the REPEATED statement
- 也就是说within-subject effect也放在repeated中,between-subject effects 需要放在MODEL中。
数据来自Comparing the SAS GLM and MIXED Procedures for Repeated Measures
data forglm(keep=person gender y1-y4) formixed(keep=person gender age y); input person gender$ y1-y4; output forglm; y=y1; age=8; output formixed; y=y2; age=10; output formixed; y=y3; age=12; output formixed; y=y4; age=14; output formixed; datalines; 1 F 21.0 20.0 21.5 23.0 2 F 21.0 21.5 24.0 25.5 3 F 20.5 24.0 24.5 26.0 4 F 23.5 24.5 25.0 26.5 5 F 21.5 23.0 22.5 23.5 6 F 20.0 21.0 21.0 22.5 7 F 21.5 22.5 23.0 25.0 8 F 23.0 23.0 23.5 24.0 9 F 20.0 21.0 22.0 21.5 10 F 16.5 19.0 19.0 19.5 11 F 24.5 25.0 28.0 28.0 12 M 26.0 25.0 29.0 31.0 13 M 21.5 22.5 23.0 26.5 14 M 23.0 22.5 24.0 27.5 15 M 25.5 27.5 26.5 27.0 16 M 20.0 23.5 22.5 26.0 17 M 24.5 25.5 27.0 28.5 18 M 22.0 22.0 24.5 26.5 19 M 24.0 21.5 24.5 25.5 20 M 23.0 20.5 31.0 26.0 21 M 27.5 28.0 31.0 31.5 22 M 23.0 23.0 23.5 25.0 23 M 21.5 23.5 24.0 28.0 24 M 17.0 24.5 26.0 29.5 25 M 22.5 25.5 25.5 26.0 26 M 23.0 24.5 26.0 30.0 27 M 22.0 21.5 23.5 25.0 ; ODS HTML; proc glm data=forglm; class gender; model y1-y4=gender / nouni; repeated age 4 (8 10 12 14) / printe; run;
between-subject effects on the MODEL statement and the main within-subject effect on the REPEATED statement.
- 就是说把重复因素放repeated中,fixed因素放model中。
The NOUNI option suppresses the printing of one-way ANOVAs for each of the four variables。就是说不打印每个Yi的单因素方差分析。
- printe是打印 R 矩阵和相关系数矩阵。
1.1
这是输入变量的level都是哪些。
1.2
这是检查重复测量之间的相关性。零假设是说满足Type H结构,P值是0.1997,接受原假设,说明是Type H结构。
- Type H是符合对称结构,就是上述结构。即不同测量之间是相关性相同的。
it is not necessary to make a multivariate assumption about these data. These tests are still valid but are less powerful than the univariate tests given the Type H assumption (Muller et al. 1992)
- 在满足Type H结构下,multivariate的检验效能不如univariate tests。
1.3
上述是within-subject effect, AGE 和 AGE*GENDER的mulvariate 检验。
1.4
上述是between-subject 和 within-subject的effect检验。
2. Repeated in MIXED
all fixed effects, both between- and within-subject, on the MODEL statement in PROC MIXED。
- 把这些固定效应都放在MODEL中。
The REPEATED statement is used to specify the matrix in the mixed model。
2.1
REPEATED语句是指定R矩阵结构。R是啥,详见固定效应 VS 随机效应。
-
REPEATED <repeated-effect> </ options>;
For many repeated measures models, no repeated effect is required in the REPEATED statement. Simply use the SUBJECT= option to define the blocks of and the TYPE= option to define their covariance structure
- 在大部分情况下,不需要指定repeated-effect。
到底在 / 之前指定还是不指定 repeated- effect,参考Usage Note 23757: When can I omit the repeated effect (preceding the slash) in the REPEATED statement in PROC MIXED?
假设要在VISIT 1 - VISIT 9个测一次,结果一个subject忘记在VISIT 3测量了,无论VISIT 3这条观测有没有存到数据中,只要在REPEATED指定 repeated-effect,两次结果都相同。如果不指定,模型参数都有非常轻微不同。
这是因为,如果 VISIT 3为空,或压根没有VISIT, 如果指定<repeated-effect>模型都会把这条缺失的VISIT 3当作VISIT4.
总结:
- 在有missing data时使用repeated-effect,无论missing data出现在explanatory variables or the dependent variable。
- 缺失是某个测量值缺失,或者在某个VISIT忘记的测量,或测量了没有值,都属于缺失值。
- 如果TYPE = CS,则不需要。因为TYPE = CS是说对同一patient任意两次测量之间,测量关系(协方差)是相同的。不管数据值是否缺失。
- 当使用TYPE = 时,必须使用两个REPEATED 变量,详见SAS help.
- 很重要的是:在使用repeated时候按subject visit 排下序。
proc mixed data=formixed; class gender age person; model y = gender|age; repeated / type=cs sub=person; run;
The SUB= option specifies PERSON to be the subject effect, which instructs PROC MIXED to make the 108 x108 variance-covariance matrix of the entire data vector to be block diagonal with 27 4 x 4 blocks.
Each of these blocks has the covariance structure given by the TYPE= option.
- sub = 是说重复测量是测量的哪个unit.
上述数据共有27个人,4次测量,27x4 = 108. 108 x 108方差-协方差矩阵,共有27个 4 x 4对角矩阵。
相关介绍参考重复测量 - MIXED混合模型。
3. Compare REPEATED in GLM and MIXED
3.1缺失值的处理
GLM是删掉整条观测,上述例子我们随机整四个缺失值,则这四个缺失值所在的观测被删除。
MIXED中是需要long form型数据,也就是27 x 4是108条,有四条有缺失值,只使用104条。
4. RANDOM in GLM
only in the extra F tests produced by the RANDOM statement. Other features in the GLM procedure—including the results of the LSMEANS and ESTIMATE statements—assume that all effects are fixed, so that all tests and estimability checks for these statements are based on a fixed-effects model, even when you use a RANDOM statement
- GLM中random只是product extra F test.
- 像LSMEANS ESTIMATE等都是基于模型的固定效应进行估计的。
- 如果想要真正的随机,用MIXED的模型。
4.1
random pat(vacgrp)/test ;
这是说对vacgrp作F检验时,使用pat*vacgrp的交互作用作为分母,也就是组内误差,组间误差仍是vacgrp的误差。其余的F检验都是使用模型的Error作为分母。
test h=vacgrp e=pat(vacgrp);
这两段代码的作用时完全相同的。
4.2
lsmeans vacgrp / cl STDERR; estimate '用estimate计算lsmean ACT' intercept 1 vacgrp 1 0;
均值计算时,这两端代码产生的结果也是相同的。详细的estimate用法参考SAS Estimate 或 Contrast。
data arthr; input vacgrp $ pat mo1 mo2 mo3 ; datalines; ACT 101 6 3 0 ACT 103 7 3 1 ACT 104 4 1 2 ACT 107 8 4 3 PBO 102 6 5 5 PBO 105 9 4 6 PBO 106 5 3 4 PBO 108 6 2 3 ; data discom; set arthr; /* keep vacgrp pat visit score;*/ score = mo1; visit = 1; output; score = mo2; visit = 2; output; score = mo3; visit = 3; output; run; proc glm data = discom; class vacgrp pat visit; model score = vacgrp pat(vacgrp) visit vacgrp*visit/ss3; test h=vacgrp e=pat(vacgrp); lsmeans vacgrp / cl STDERR; quit; run;
5. RANDOM in MIXED
REOEATED中的TYPE是指定R矩阵结构,RANDOM中的TYPE是指定G矩阵结构。
G矩形结构可参考随机效应 VS 固定效应。
6. Other REPEATED in GLM
handling repeated measures designs with one repeated response variable.
在有多个反应变量或者一个反应变量被测量多次时候使用。
6.1
model Y1-Y12=group / nouni; REPEATED TRIAL 3 (A B C), TIME 4 (T1 T2 T3 T4);
three treatments are administered at each of four times, for a total of twelve dependent variables on each experimental unit.
假设有三种实验在四个不同的时间进行. 则每一位受试有十二个分数.
括号内的值用来标明组别,如 TRIAL 这个重复变量有三组 即 A B 与 C.
6.2