6.0 Introduction
This chapter will use the elemapi2 data that you have seen in the prior chapters. We assume that you have put the data files in "c:sasreg" directory.
data elemapi2; set 'c:sasregelemapi2'; run;
For this chapter we will use the elemapi2 data file that we have been using in prior chapters. We will focus on the variables mealcat, and collcat as they relate to the outcome variable api00 (performance on the api in the year 2000. The variable mealcat is the variable meals broken up into three categories, and the variable collcat is the variable some_col broken into 3 categories. We could think of mealcat as being the number of students receiving free meals and broken up into low, middle and high. The variable collcat can be thought of as the number of parents with some college education, and we could think of it as being broken up into low, medium and high. For our analysis, we think that both mealcat and collcat may be related to api00, but it is also possible that the impact of mealcat might depend on the level of collcat. In other words, we think that there might be an interaction of these two categorical variables. In this chapter we will look at how these two categorical variables are related to api performance in the school, and we will look at the interaction of these two categorical variables as well. We will see that there is an interaction of these categorical variables, and will focus on different ways of further exploring the interaction. Let's have a quick look at these variables.
proc tabulate data=elemapi2; class collcat mealcat ; var api00; table mealcat='mealcat', mean=' '*api00='API Index for 2000'*collcat='collcat'*F=10.2 / RTS=13.; run;----------------------------------------------| | API Index for 2000 || |--------------------------------|| | collcat || |--------------------------------|| | 1 | 2 | 3 ||-----------+----------+----------+----------||mealcat | | | ||-----------| | | ||1 | 816.91| 825.65| 782.15||-----------+----------+----------+----------||2 | 589.35| 636.60| 655.64||-----------+----------+----------+----------||3 | 493.92| 508.83| 541.73|----------------------------------------------
6.1. Analysis with two categorical variables
One traditional way to analyze this would be to perform a 3 by 3 factorial analysis of variance using proc glm, as shown below. The results show a main effect of collcat (F=4.5, p-0.0117), a main effect of mealcat (F=509.04, p=0.0000) and an interaction of collcat by mealcat, (F=6.63, p=0.0000). We also use lsmeans and output statement to output the predicted means for each group and get ourselve ready to graph the cell means.
proc glm data = elemapi2; class collcat mealcat; model api00 = collcat | mealcat /ss3; lsmeans collcat*mealcat; output out = pred p = pred; run; quit;
The GLM Procedure Class Level InformationClass Levels Valuescollcat 3 1 2 3mealcat 3 1 2 3Number of observations 400The GLM ProcedureDependent Variable: api00 api 2000 Sum ofSource DF Squares Mean Square F Value Pr > FModel 8 6243714.810 780464.351 166.76 <.0001Error 391 1829957.187 4680.197Corrected Total 399 8073671.998R-Square Coeff Var Root MSE api00 Mean0.773343 10.56356 68.41197 647.6225Source DF Type III SS Mean Square F Value Pr > Fcollcat 2 42140.566 21070.283 4.50 0.0117mealcat 2 4764843.563 2382421.781 509.04 <.0001collcat*mealcat 4 124167.809 31041.952 6.63 <.0001Least Squares Meanscollcat mealcat api00 LSMEAN1 1 816.9142861 2 589.3500001 3 493.9189192 1 825.6511632 2 636.6046512 3 508.8333333 1 782.1509433 2 655.6376813 3 541.733333
We can now create the graph of cell means of api00 using the dataset pred.
proc sort data = pred; by mealcat; run;symbol1 v=circle i=join ci=blue h= 2; symbol2 v=triangle i=join ci=red h =2; symbol3 v=square i=join ci=black h =2;proc gplot data = pred; plot pred*mealcat=collcat ; run; quit;
We can do the same analysis using the regression approach via proc reg. We use simple regression coding for both collcat and mealcat. We also create interaction terms for them. The first test statement tests the effect of main effect of collcat, the second the main effect of mealcat and the last one on the effect of overall interaction.
data reg1; set elemapi2; s2 = -1/3; s3=-1/3; if collcat = 2 then s2 = 2/3; if collcat = 3 then s3 = 2/3; m2 = -1/3; m3 = -1/3; if mealcat = 2 then m2 = 2/3; if mealcat = 3 then m3 = 2/3; sm22 = s2*m2; sm23 = s2*m3; sm32 = s3*m2; sm33 = s3*m3; run; proc reg data = reg1; model api00 = s2 s3 m2 m3 sm22 sm23 sm32 sm33; Collcat: test s2=s3=0; Mealcat: test m2=m3=0; Interaction: test sm22=sm23=sm32=sm33=0; output out = pred2 p = pred; run; quit;
The REG ProcedureModel: MODEL1Dependent Variable: api00 api 2000 Analysis of Variance Sum of MeanSource DF Squares Square F Value Pr > FModel 8 6243715 780464 166.76 <.0001Error 391 1829957 4680.19741Corrected Total 399 8073672Root MSE 68.41197 R-Square 0.7733Dependent Mean 647.62250 Adj R-Sq 0.7687Coeff Var 10.56356 Parameter Estimates Parameter StandardVariable Label DF Estimate Error t Value Pr > |t|Intercept Intercept 1 650.08826 3.87189 167.90 <.0001s2 1 23.63531 9.10533 2.60 0.0098s3 1 26.44625 9.99513 2.65 0.0085m2 1 -181.04135 9.07713 -19.94 <.0001m3 1 -293.41027 9.44946 -31.05 <.0001sm22 1 38.51777 24.19532 1.59 0.1122sm23 1 6.17754 20.08262 0.31 0.7585sm32 1 101.05102 22.88808 4.42 <.0001sm33 1 82.57776 24.43941 3.38 0.0008 Test Collcat Results for Dependent Variable API00 MeanSource DF Square F Value Pr > FNumerator 2 21070 4.50 0.0117Denominator 391 4680.19741 Test Mealcat Results for Dependent Variable API00 MeanSource DF Square F Value Pr > FNumerator 2 2382422 509.04 <.0001Denominator 391 4680.19741 Test Interaction Results for Dependent Variable API00 MeanSource DF Square F Value Pr > FNumerator 4 31042 6.63 <.0001Denominator 391 4680.19741
First, note that the results of the test statements correspond to those from proc glm statement. This is because collcat and mealcat were coded using simple effect coding, a coding scheme where the contrasts sum to 0. If this had been coded using dummy coding, then the results of the test commands for mealcat and collcat from the proc reg would not have corresponded to the proc glm results. In addition to simple coding, we could have used deviation or helmert coding schemes and the results of the test commands would have matched the result from proc glm, although the meaning of the individual tests would have been different. This point will be explored in more detail later in this chapter.
The graph of the cell means we obtained before illustrates the interaction between collcat and mealcat. The graph shows the 3 levels of collcat as 3 different lines, and the 3 levels of mealcat as the 3 values on the x axis of the graph. We can see that the effect of collcat differs based on the level of mealcat. For example, when mealcat is low, schools where collcat is 3 have the lowest api00 scores, as compared to schools that are medium or high on mealcat, where schools with collcat of 3 have the highest api00 scores.
Let's investigate this interaction further by looking at the simple effects of collcat at each level of mealcat.
6.2. Simple effects
6.2.1 Analyzing simple effects using PROC GLM
This analysis looks at the simple effects of collcat at the different levels of mealcat using proc glm. The lsmeans statement with option slice = mealcat gives the test of effects of collcat at each level of mealcat.
proc glm data= elemapi2; class collcat mealcat; model api00 = mealcat|collcat ; lsmeans mealcat*collcat / slice = mealcat ; run; quit;
The GLM Procedure Sum ofSource DF Squares Mean Square F Value Pr > FModel 8 6243714.810 780464.351 166.76 <.0001Error 391 1829957.187 4680.197Corrected Total 399 8073671.998R-Square Coeff Var Root MSE API00 Mean0.773343 10.56356 68.41197 647.6225Source DF Type III SS Mean Square F Value Pr > FMEALCAT 2 4764843.563 2382421.781 509.04 <.0001COLLCAT 2 42140.566 21070.283 4.50 0.0117COLLCAT*MEALCAT 4 124167.809 31041.952 6.63 <.0001
COLLCAT MEALCAT API00 LSMEAN1 1 816.9142861 2 589.3500001 3 493.9189192 1 825.6511632 2 636.6046512 3 508.8333333 1 782.1509433 2 655.6376813 3 541.733333 COLLCAT*MEALCAT Effect Sliced by MEALCAT for API00 Sum ofMEALCAT DF Squares Mean Square F Value Pr > F1 2 50909 25455 5.44 0.00472 2 68629 34314 7.33 0.00073 2 29979 14990 3.20 0.0417
6.2.2 Analyzing Simple Effects Using PROC REG
We have demonstrated how to test the simple effect of collcat at each level of mealcat using PROC GLM in the previous section. That is through the approach of ANOVA. We can also obtain the same analysis through regression approach. After all, Anova is regression. In regression approach, we will create the coding for variable collcat, mealcat and their interaction. The coding scheme is specific for the effect we want to see. For example, in this section, we will do an analysis parallel to the previous section. That is to say that we want to see the simple effect of collcat at each level of mealcat. We will use simple coding for mealcat, though in our case the type of coding for mealcat does not really matter. The scheme for simple coding is shown chapter 5. The reference group for mealcat is group 1. We use helmert coding for collcat. We should note that these terms are not used in the analysis, but are used for creating the simple effects of collcat at each level of mealcat.
data reg2; set elemapi2; mcat1 = 1/3; mcat2 = 1/3; if mealcat = 3 then mcat1 = -2/3; if mealcat = 2 then mcat2 = -2/3; ccat1 = -1/3; if collcat = 1 then do; ccat1 = 2/3; ccat2 = 0; end; if collcat = 2 then ccat2 = .5; if collcat = 3 then ccat2 = -.5; c1m1 = 0; c2m1 = 0; c1m2 = 0; c2m2 = 0; c1m3 = 0; c2m3 = 0;if ( mealcat = 1) then do; c1m1 = ccat1; c2m1 = ccat2; end;if ( mealcat = 2) then do; c1m2 = ccat1; c2m2 = ccat2; end; if ( mealcat = 3) then do; c1m3 = ccat1; c2m3 = ccat2; end; run;
Now, that we have seen the helmert coding for collcat, we can see how this is used to create the simple effects of collcat at each level of mealcat. First, we look at the two comparisons of collcat at mealcat of 1. Note that the coding is the same as we saw above, but only when mealcat is 1, otherwise these variables are coded 0. Likewise, we look at the terms that form the effects of collcat when mealcat is 2, and we see that the variables are coded the same way when mealcat is 2, and otherwise 0. The same is true for the case when mealcat is 3. The following matrix is the coding we just used for all the interaction terms.
collcat | mealcat | c1m1 | c2m1 | c1m2 | c2m2 | c1m3 | c2m3 |
1 | 1 | 2/3 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | -1/3 | 1/2 | 0 | 0 | 0 | 0 |
3 | 1 | -1/3 | -1/2 | 0 | 0 | 0 | 0 |
1 | 2 | 0 | 0 | 2/3 | 0 | 0 | 0 |
2 | 2 | 0 | 0 | -1/3 | 1/2 | 0 | 0 |
3 | 2 | 0 | 0 | -1/3 | -1/2 | 0 | 0 |
1 | 3 | 0 | 0 | 0 | 0 | 2/3 | 0 |
2 | 3 | 0 | 0 | 0 | 0 | -1/3 | 1/2 |
3 | 3 | 0 | 0 | 0 | 0 | -1/3 | -1/2 |
Now we are ready for our regression analysis. The test statements used below are for testing the simple effect of collcat at each level of mealcat.
proc reg data = reg2; model api00 = mcat1 mcat2 c1m1 c2m1 c1m2 c2m2 c1m3 c2m3; mealcat1: test c1m1 = c2m1 = 0; mealcat2: test c1m2 = c2m2 = 0; mealcat3: test c1m3 = c2m3 = 0; run; quit;
The REG ProcedureModel: MODEL1Dependent Variable: API00 api 2000
Analysis of Variance Sum of MeanSource DF Squares Square F Value Pr > FModel 8 6243715 780464 166.76 <.0001Error 391 1829957 4680.19741Corrected Total 399 8073672Root MSE 68.41197 R-Square 0.7733Dependent Mean 647.62250 Adj R-Sq 0.7687Coeff Var 10.56356 Parameter Estimates Parameter StandardVariable Label DF Estimate Error t Value Pr > |t|Intercept Intercept 1 650.08826 3.87189 167.90 <.0001MCAT1 1 293.41027 9.44946 31.05 <.0001MCAT2 1 181.04135 9.07713 19.94 <.0001C1M1 1 13.01323 13.52800 0.96 0.3367C2M1 1 43.50022 14.04092 3.10 0.0021C1M2 1 -56.77117 16.67866 -3.40 0.0007C2M2 1 -19.03303 13.29175 -1.43 0.1530C1M3 1 -31.36441 12.86955 -2.44 0.0153C2M3 1 -32.90000 20.23653 -1.63 0.1048
Test mealcat1 Results for Dependent Variable API00 MeanSource DF Square F Value Pr > FNumerator 2 25455 5.44 0.0047Denominator 391 4680.19741 Test mealcat2 Results for Dependent Variable API00 MeanSource DF Square F Value Pr > FNumerator 2 34314 7.33 0.0007Denominator 391 4680.19741 Test mealcat3 Results for Dependent Variable API00 MeanSource DF Square F Value Pr > FNumerator 2 14990 3.20 0.0417Denominator 391 4680.19741
6.3 Simple Comparisons
In the analyses above we looked at the simple effect of collcat at each level of mealcat. For example, we looked at the overall effect of collcat when mealcat was 1. This is the simple effect of collcat at mealcat=1. Because collcat has more than 2 levels, we may wish to make further comparisons among the 3 levels of collcat within mealcat=1. Simple comparisons allow us to make such comparisons.
6.3.1 Analyzing Simple Comparisons Using PROC REG
In the previous regression analysis, we used helmert coding for collcat. We choose this coding scheme so we could compare group 1 with groups 2 and 3 and then compare groups 2 and 3 within mealcat = 1. For example, if we wanted to compare collcat 1 vs. 2 and 3, we would want to look at the effect c1m1, and if we wanted to compare collcat groups 2 and 3 when mealcat is 1, then we would look at the effect c2m1. For example, c1m1 is not significant with t-value = 0.96 and p-value = 0.3367. That is to say that the difference between group 1 of collcat with group 2 and group 3 with mealcat = 1 is not significant.
6.3.2 Analyzing Simple Comparisons Using PROC GLM
We can also look at the simple comparisons using PROC GLM. For example, for the comparsion of group 1 vs 2+ of collcat within mealcat = 1, we can do the following. The estimate statement below indicates that the comparison on collcat is between group 1 and all the upper groups and the comparison is restricted to within mealcat = 1.
proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat/ss3; estimate 'collcat 1 vs 2+ within mealcat = 1' collcat 1 -.5 -.5 collcat*mealcat 1 0 0 -.5 0 0 -.5 0 0; run; quit;
The GLM ProcedureDependent Variable: API00 api 2000 Sum ofSource DF Squares Mean Square F Value Pr > FModel 8 6243714.810 780464.351 166.76 <.0001Error 391 1829957.187 4680.197Corrected Total 399 8073671.998R-Square Coeff Var Root MSE API00 Mean0.773343 10.56356 68.41197 647.6225Source DF Type III SS Mean Square F Value Pr > FCOLLCAT 2 42140.566 21070.283 4.50 0.0117MEALCAT 2 4764843.563 2382421.781 509.04 <.0001COLLCAT*MEALCAT 4 124167.809 31041.952 6.63 <.0001 StandardParameter Estimate Error t Value Pr > |t|collcat 1 vs 2+ within mealcat = 1 13.0132326 13.5279998 0.96 0.3367
6.4 Partial Interaction
A partial interaction allows you to apply contrasts to one of the effects in an interaction term. For example, we can draw the interaction of collcat by mealcat like this below.
Collcat low | Collcat Med | Collcat High | |
Mealcat Low | |||
Mealcat Med | |||
Mealcat High |
Say that we wanted to compare, in the context of this interaction, group 1 for collcat vs. groups 2 and 3. The table of this partial interaction would look like this. The contrast coefficients of -2 1 1 applied to collcat indicate the comparison of group 1 for collcat vs. groups 2 and 3.
-2 | 1 | 1 | |
Collcat low | Collcat Med | Collcat High | |
Mealcat Low | |||
Mealcat Med | |||
Mealcat High |
Likewise, we also might want to compare groups 2 and 3 of collcat by mealcat, and the table of this interaction would look like this.
0 | -1 | 1 | |
Collcat low | Collcat Med | Collcat High | |
Mealcat Low | |||
Mealcat Med | |||
Mealcat High |
These are called partial interactions because contrast coefficients are applied to one of the terms involved in the interaction.
6.4.1 Analyzing partial interactions using PROC GLM
We wish to compare groups 1 versus 2 on collcat. Similarly, we can also compare groups 2 and 3 on collcat. For example, we want to test the partial interaction of collcat comparing group 1 vs. 2 and 3 by mealcat, we can do the following contrast statement. Because mealcat has 2 degrees of freedom, the test of partial interaction also has 2 degrees of freedom. The 2 degrees of freedom of factor mealcat can be broken down into 2 comparisons. These two interaction contrasts are separated by a semi-colon, which tells SAS to join these contrasts together into a single test with 2 degrees of freedom.
proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat; contrast 'test of sm11 and sm12' collcat*mealcat 1 -1 0 -.5 .5 0 -.5 .5 0, collcat*mealcat 0 1 -1 0 -.5 .5 0 -.5 .5; contrast 'test of sm21 and sm22' collcat*mealcat 0 0 0 1 -1 0 -1 1 0, collcat*mealcat 0 0 0 0 1 -1 0 -1 1; run; quit;
The GLM Procedure<output omitted>Contrast DF Contrast SS Mean Square F Value Pr > Ftest of sm11 and sm12 2 54141.40962 27070.70481 5.78 0.0033test of sm21 and sm22 2 66511.60133 33255.80067 7.11 0.0009
6.4.2 Analyzing partial interactions Using PROC REG
With regression analysis, we can also compare groups 1 vs. 2 and 3 on collcat, or compare groups 2 and 3 on collcat. This implies Helmert coding on collcat, as we did before.
data reg3; set elemapi2; if mealcat = 1 then m1 = 2/3; if mealcat = 2 then m1 = -1/3; if mealcat = 3 then m1 = -1/3; if mealcat = 1 then m2 = 1/3; if mealcat = 2 then m2 = 1/3; if mealcat = 3 then m2 = -2/3; if collcat = 1 then s1 = 2/3; if collcat = 2 then s1 = -1/3; if collcat = 3 then s1 = -1/3; if collcat = 1 then s2 = 0; if collcat = 2 then s2 = 1/2; if collcat = 3 then s2 = -1/2; sm11 = s1*m1; sm12 = s1*m2; sm21 = s2*m1; sm22 = s2*m2; run; proc reg data = reg3; model api00 = s1 s2 m1 m2 sm11 sm12 sm21 sm22; test sm11 = sm12 = 0; test sm21 = sm22 = 0; run; quit;
The REG ProcedureModel: MODEL1Dependent Variable: api00 api 2000 Analysis of Variance Sum of MeanSource DF Squares Square F Value Pr > FModel 8 6243715 780464 166.76 <.0001Error 391 1829957 4680.19741Corrected Total 399 8073672Root MSE 68.41197 R-Square 0.7733Dependent Mean 647.62250 Adj R-Sq 0.7687Coeff Var 10.56356 Parameter Estimates Parameter StandardVariable Label DF Estimate Error t Value Pr > |t|Intercept Intercept 1 650.08826 3.87189 167.90 <.0001s1 1 -25.04078 8.34539 -3.00 0.0029s2 1 -2.81094 9.32938 -0.30 0.7633m1 1 181.04135 9.07713 19.94 <.0001m2 1 112.36892 9.90759 11.34 <.0001sm11 1 69.78440 21.47520 3.25 0.0013sm12 1 -25.40675 21.06663 -1.21 0.2285sm21 1 62.53325 19.33438 3.23 0.0013sm22 1 13.86697 24.21132 0.57 0.5671 Test 1 Results for Dependent Variable api00 MeanSource DF Square F Value Pr > FNumerator 2 27071 5.78 0.0033Denominator 391 4680.19741 Test 2 Results for Dependent Variable api00 MeanSource DF Square F Value Pr > FNumerator 2 33256 7.11 0.0009Denominator 391 4680.19741
6.5. Interaction Contrasts
Above we saw that a partial interaction allows you to apply contrast coefficients to one of the terms in a 2 way interaction. An interaction contrast allows you to apply contrast coefficients to both of the terms in a two way interaction.
For example, with respect to collcat, let's say that we wish to compare groups 2 and 3, and with respect to mealcat we wish to compare groups 1 and 2. The table of this looks like this below.
-1 | 1 | 0 | ||
Collcat low | Collcat Med | Collcat High | ||
0 | Mealcat Low | |||
-1 | Mealcat Med | |||
1 | Mealcat High |
We also would like to form a second interaction contrast that also compares groups 2 and 3 with respect to collcat, and compares groups 2 and 3 on mealcat. A table of this comparison is shown below.
0 | -1 | 1 | ||
Collcat low | Collcat Med | Collcat High | ||
0 | Mealcat Low | |||
-1 | Mealcat Med | |||
1 | Mealcat High |
If we look at the graph of the predicted values (repeated below) we constructed before, it compares line 2 and 3 (collcat 2 vs. 3) by mealcat 1 vs. 2, and then again by mealcat 2 vs. 3.
6.5.1 Analyzing Interaction Contrasts Using PROG GLM
proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat; contrast 'collcat 2v3 with mealcat 1v2' collcat*mealcat 0 0 0 1 -1 0 -1 1 0; contrast 'somecat 2v3 with mealcat 2v3' collcat*mealcat 0 0 0 0 1 -1 0 -1 1; run; quit;
The GLM Procedure <output omitted>Contrast DF Contrast SS Mean Square F Valuecollcat 2v3 with mealcat 1v2 1 48958.23687 48958.23687 10.46somceat 2v3 with mealcat 2v3 1 1535.28987 1535.28987 0.33Contrast Pr > Fcollcat 2v3 with mealcat 1v2 0.0013somceat 2v3 with mealcat 2v3 0.5671
6.5.2 Analyzing interaction contrasts using PROC REG
In regression analysis, we have seen that difference coding schemes of the variables give us difference contrasts and comparisons. Because we would like to compare groups 1 vs. 2, and then groups 2 vs. 3 on mealcat, we will use forward difference coding for mealcat (which will compare 1 vs. 2, then 2 vs. 3).
data reg4; set elemapi2; if mealcat = 1 then m1 = 2/3; if mealcat = 2 then m1 = -1/3; if mealcat = 3 then m1 = -1/3; if mealcat = 1 then m2 = 1/3; if mealcat = 2 then m2 = 1/3; if mealcat = 3 then m2 = -2/3; if collcat = 1 then s1 = 2/3; if collcat = 2 then s1 = -1/3; if collcat = 3 then s1 = -1/3; if collcat = 1 then s2 = 0; if collcat = 2 then s2 = 1/2; if collcat = 3 then s2 = -1/2; sm11 = s1*m1; sm12 = s1*m2; sm21 = s2*m1; sm22 = s2*m2; run; proc reg data = reg4; model api00 = s1 s2 m1 m2 sm11 sm12 sm21 sm22; run; quit;
The REG ProcedureModel: MODEL1Dependent Variable: api00 api 2000 Analysis of Variance Sum of MeanSource DF Squares Square F Value Pr > FModel 8 6243715 780464 166.76 <.0001Error 391 1829957 4680.19741Corrected Total 399 8073672Root MSE 68.41197 R-Square 0.7733Dependent Mean 647.62250 Adj R-Sq 0.7687Coeff Var 10.56356 Parameter Estimates Parameter StandardVariable Label DF Estimate Error t Value Pr > |t|Intercept Intercept 1 650.08826 3.87189 167.90 <.0001s1 1 -25.04078 8.34539 -3.00 0.0029s2 1 -2.81094 9.32938 -0.30 0.7633m1 1 181.04135 9.07713 19.94 <.0001m2 1 112.36892 9.90759 11.34 <.0001sm11 1 69.78440 21.47520 3.25 0.0013sm12 1 -25.40675 21.06663 -1.21 0.2285sm21 1 62.53325 19.33438 3.23 0.0013sm22 1 13.86697 24.21132 0.57 0.5671
6.6 Computing Adjusted Means
Our model will be almost the same as before, in addition we include an additional covariate emer. We want to obtain the adjusted means of api00 adjusted for variable emer. These adjusted means compute the mean that would be expected if every school in the sample were at the mean for the variable emer.
6.6.1 Computing Adjusted Means via PROC GLM
The syntax to get the adjusted means using proc glm is as follows. The default is to adjust at the means and it can be changed by using at variable = value option following the lsmeans statement.
proc glm data = elemapi2; class collcat mealcat; model api00 = collcat mealcat collcat*mealcat emer /ss3; lsmeans collcat*mealcat; run; quit;
The GLM Procedure Sum ofSource DF Squares Mean Square F Value Pr > FModel 9 6402428.265 711380.918 166.01 <.0001Error 390 1671243.733 4285.240Corrected Total 399 8073671.998R-Square Coeff Var Root MSE api00 Mean0.793001 10.10801 65.46175 647.6225Source DF Type III SS Mean Square F Value Pr > Fcollcat 2 34730.090 17365.045 4.05 0.0181mealcat 2 3017331.845 1508665.923 352.06 <.0001collcat*mealcat 4 96789.116 24197.279 5.65 0.0002emer 1 158713.455 158713.455 37.04 <.0001
collcat mealcat api00 LSMEAN1 1 797.5604281 2 596.9728111 3 509.8722412 1 812.5502482 2 636.4049402 3 523.8846593 1 767.9352413 2 652.9761463 3 550.461628
6.6.2 Computing Adjusted Means via REGRESSION
Now we illustrate how to get the same adjusted means if you were to to the analysis via the proc reg. First, we need to create all the necessary dummy variables for the categorical variables. The choice of coding schemes does not matter for the purpose of obtaining the adjusted means. We choose the same coding scheme we used before for both mealcat and collcat below. After coding our variables properly, we proceed to proc reg to generate the regression equation used later in the proc score statement to generate predicted valued based on the equation. The proc sql statement below simply generates a new variable meanemer as the mean of emer.
data reg6; set elemapi2; if collcat = 1 then s2 = 2/3; if collcat = 2 then s2 = -1/3; if collcat = 3 then s2 = -1/3; if collcat = 1 then s3 = -1/3; if collcat = 2 then s3 = 2/3; if collcat = 3 then s3 = -1/3; if mealcat = 1 then m2 = 2/3; if mealcat = 2 then m2 = -1/3; if mealcat = 3 then m2 = -1/3; if mealcat = 1 then m3 = -1/3; if mealcat = 2 then m3 = 2/3; if mealcat = 3 then m3 = -1/3; sm22 = s2*m2; sm23 = s2*m3; sm32 = s3*m2; sm33 = s3*m3; run; proc reg data = reg6 outest = pred6 noprint; yhat: model api00 = s2 s3 m2 m3 sm22 sm23 sm32 sm33 emer; run; quit; proc sql; create table xy as select *, mean(emer) as meanemer from reg6; quit;
NOTE: You need to rename meanemer to emer or else the proc score will not work The variables listed on the var statement in the proc score must be the same as the IVs in the regression. If they are not, you get a cryptic message about not finding a variable , even though you can see the variable in the data set.
data xyz; set xy; emer = meanemer; run; proc score data = xyz score = pred6 out = ep type = parms; var s2 s3 m2 m3 sm22 sm23 sm32 sm33 emer; run; proc means data = ep mean; class collcat mealcat; var yhat; run;
The MEANS Procedure Analysis Variable : yhat Percentage free meals in 3 N collcat categories Obs Mean ------------------------------------------- 1 1 35 797.5629402 2 20 596.9753239 3 74 509.8747538 2 1 43 812.5527606 2 43 636.4074521 3 48 523.8871715 3 1 53 767.9377531 2 69 652.9786583 3 15 550.4641407 -------------------------------------------
6.7 More Details on Meaning of the Coefficients
So far we have discussed a variety of techniques that you can use to help interpret interactions of categorical variables in regression, but we have not gone into a great detail about the meaning of the coefficients in these analyses. Let's consider this further. Consider the analysis below using collcat and mealcat, using simple contrasts on both of these variables. The reference group for both variables will be group 1.
data reg7; set elemapi2; if collcat = 1 then s1 = -1/3; if collcat = 2 then s1 = 2/3; if collcat = 3 then s1 = -1/3; if collcat = 1 then s2 = -1/3; if collcat = 2 then s2 = -1/3; if collcat = 3 then s2 = 2/3; if mealcat = 1 then m1 = -1/3; if mealcat = 2 then m1 = 2/3; if mealcat = 3 then m1 = -1/3; if mealcat = 1 then m2 = -1/3; if mealcat = 2 then m2 = -1/3; if mealcat = 3 then m2 = 2/3; sm11 = s1*m1; sm12 = s1*m2; sm21 = s2*m1; sm22 = s2*m2; run; proc reg data = reg7; model api00 = s1 s2 m1 m2 sm11 sm12 sm21 sm22; output out = predreg7 p = yhat; run; quit;
The REG ProcedureModel: MODEL1Dependent Variable: api00 api 2000 Analysis of Variance Sum of MeanSource DF Squares Square F Value Pr > FModel 8 6243715 780464 166.76 <.0001Error 391 1829957 4680.19741Corrected Total 399 8073672Root MSE 68.41197 R-Square 0.7733Dependent Mean 647.62250 Adj R-Sq 0.7687Coeff Var 10.56356 Parameter Estimates Parameter StandardVariable Label DF Estimate Error t Value Pr > |t|Intercept Intercept 1 650.08826 3.87189 167.90 <.0001s1 1 23.63531 9.10533 2.60 0.0098s2 1 26.44625 9.99513 2.65 0.0085m1 1 -181.04135 9.07713 -19.94 <.0001m2 1 -293.41027 9.44946 -31.05 <.0001sm11 1 38.51777 24.19532 1.59 0.1122sm12 1 6.17754 20.08262 0.31 0.7585sm21 1 101.05102 22.88808 4.42 <.0001sm22 1 82.57776 24.43941 3.38 0.0008
We can produce the adjusted means as shown below. These will be useful for interpreting the meaning of the coefficients.
proc means data = predreg7 mean; class collcat mealcat; var yhat; run;
The MEANS Procedure Analysis Variable : yhat Predicted Value of api00 Percentage free meals in 3 N collcat categories Obs Mean--------------------------------------------------- 1 1 35 816.9142857 2 20 589.3500000 3 74 493.9189189 2 1 43 825.6511628 2 43 636.6046512 3 48 508.8333333 3 1 53 782.1509434 2 69 655.6376812 3 15 541.7333333---------------------------------------------------
Let's consider the meaning of the coefficient for s1. The coding for this variable compares group 2 vs. group 1, hence this coefficient corresponds to mean(collcat = 2) - mean(collcat = 1). Note that these are the unweighted means, so we compute the mean for collcat = 2 as the mean of the 3 cells corresponding to collcat = 2, i.e. (825.651+636.605+508.833)/3 . If we compare the result below to the coefficient for s1 we see that they are the same,
(825.651+636.605+508.833)/3 - (816.914+589.35+493.919)/3 = 23.635333.
Likewise, the coefficient for s2 is mean(collcat = 3) - mean(collcat = 1), computed below. The value below corresponds to the coefficient for s2.
(782.151+655.638+541.733)/3 - (816.914+589.35+493.919)/3 = 26.446333
Likewise, the coefficient for m1 works out to be mean(mealcat = 2) - mean(mealcat = 1), computed below.
(589.35+636.605+655.638)/3 - (816.914+825.651+782.151)/3 = -181.041.
And the coefficient for m2 is mean(mealcat = 3) - mean(mealcat = 1), computed below.
(493.919+508.833+541.733)/3 - (816.914+825.651+782.151)/3 = -293.41033
To get the meaning of the coefficients for the interaction terms, let's write out the regression equation and take a closer look at the coefficients. From the parameter estimates, we have the following linear equation for predicted values:
yhat = 650.090 + 23.635*s1 + 26.446*s2 - 181.042*m1 - 293.412*m2 + 38.518*s1*m1 + 6.178*s1*m2 + 101.051*s2*m1 + 82.578*s2*m2.
Because of the simple coding scheme we use for both variables, we have from the above equation,
yhat(collcat = 2) - yhat(collcat = 1) = 23.635 + 38.518*ms1 + 6.178*ms2.
One way to think about this equation is that for any level of mealcat comparing group 2 vs. group 1 on collcat only involves s1. It then follows that the coefficient for sm11 is to compare the difference of group 2 vs. 1 on collcat when mealcat is 2 with the difference of group 2 vs. 1 on collcat when mealcat is 1. In other words, sm11 is
[cell(2,2)-cell(1,2)] - [cell(2,1)-cell(1,1)].
Plugging all the corresponding cell means to the above formula, we get
(636.6047 - 589.3500) - (825.6512 - 816.9143) = 38.5175,
which is the coefficient for sm11. Using the same argument, we can have the following
sm11 : [cell(2,2)-cell(1,2)] - [cell(2,1)-cell(1,1)],
sm12 : [cell(2,3)-cell(1,3)] - [cell(2,1)-cell(1,1)],
sm21 : [cell(3,2)-cell(1,2)] - [cell(3,1)-cell(1,1)],
sm22 : [cell(3,3)-cell(1,3)] - [cell(3,1)-cell(1,1)].
We can go through the same process to verify the meaning of the coefficients for the other 3 interaction terms. We verify that sm12 is 6.1775.
(508.8333 - 493.9189) - (825.6512 - 816.9143) = 6.1775.
We also verify that sm21 is 101.051.
(655.6377 - 589.3500) - (782.1509 - 816.9143) = 101.0511.
Last we verify that sm22 is 82.5778.
( 541.7333 - 493.9189) - ( 782.1509 - 816.9143) = 82.5778.
6.8 Simple Effects via Dummy Coding vs. Effect Coding
We have used in this chapter different types of coding schemes. You may wonder why we have gone to the effort of creating and testing these effects instead of just using dummy coding and what is the difference between different coding schemes and how to choose them. In this section, let's compare how to get simple effects using the effect coding to how we would get simple effects using dummy coding. We hope to show that it is much easier to use effect coding so that the interpretation of the coefficients is much more intuitive.
6.8.1 Example 1. Simple effects of yr_rnd at levels of mealcat
Let's use an example from Chapter 3 (section 3.5). In that example we looked at and analysis using mealcat and yr_rnd and the interaction of these two variables. First, we look at how to do a simple effects analysis looking at the simple effects of yr_rnd at each level of mealcat using effect coding. To make our results correspond to those from Chapter 3, we will make category 3 of mealcat the reference category.
data reg8; set elemapi2; if mealcat = 1 then do; ms1 =2/3; ms2 = -1/3; end; if mealcat = 2 then do; ms1 =-1/3; ms2= 2/3; end; if mealcat = 3 then do; ms1 =-1/3; ms2 = -1/3; end; if yr_rnd = 0 then yr1 = -1/2; else yr1 = 1/2; ym1 = 0; ym2 = 0; ym3 = 0; if mealcat = 1 then ym1 = yr1; if mealcat = 2 then ym2 = yr1; if mealcat = 3 then ym3 = yr1; run; proc reg data = reg8; model api00 = ms1 ms2 ym1 ym2 ym3; run; quit;
The REG ProcedureModel: MODEL1Dependent Variable: API00 api 2000
Analysis of Variance Sum of MeanSource DF Squares Square F Value Pr > FModel 5 6204728 1240946 261.61 <.0001Error 394 1868944 4743.51314Corrected Total 399 8073672Root MSE 68.87317 R-Square 0.7685Dependent Mean 647.62250 Adj R-Sq 0.7656Coeff Var 10.63477 Parameter Estimates Parameter StandardVariable Label DF Estimate Error t Value Pr > |t|Intercept Intercept 1 632.23557 5.80048 109.00 <.0001MS1 1 267.81076 14.61559 18.32 <.0001MS2 1 114.65715 11.12812 10.30 <.0001ym1 1 -74.25691 26.75629 -2.78 0.0058ym2 1 -51.74017 18.88854 -2.74 0.0064ym3 1 -33.49254 11.77129 -2.85 0.0047Now we can obtain the simple effect of yr_rnd at mealcat = 1 by inspecting the coefficient for ym1, the simple effect of yr_rnd at mealcat = 2 by inspecting the coefficient for ym2 and the simple effect of yr_rnd at mealcat = 3 by inspecting the coefficient for ym3.
Now let's perform the same analysis using dummy coding. Again, we will explicitly make the 3rd category for mealcat to be the omitted category.
data reg9; set elemapi2; if mealcat = 1 then do; md1 = 1; md2 = 0; end; if mealcat = 2 then do; md1 = 0; md2 = 1; end; if mealcat = 3 then do; md1 = 0; md2 = 0; end; ymd1 = yr_rnd*md1; ymd2 = yr_rnd*md2; run;proc reg data = reg9; model api00 = yr_rnd md1 md2 ymd1 ymd2; run;
The REG ProcedureModel: MODEL1Dependent Variable: API00 Parameter Estimates Parameter StandardVariable DF Estimate Error t Value Pr > |t|Intercept 1 521.49254 8.41420 61.98 <.0001YR_RND 1 -33.49254 11.77129 -2.85 0.0047MD1 1 288.19295 10.44284 27.60 <.0001MD2 1 123.78097 10.55185 11.73 <.0001ymd1 1 -40.76438 29.23118 -1.39 0.1639ymd2 1 -18.24763 22.25624 -0.82 0.4128
In order to form a test of simple main effects we need to make a table like the one shown below that relates the cell means to the coefficients in the regression. Please see Chapter 3, section 3.5 for information on how this table was constructed.
mealcat=1 mealcat=2 mealcat=3
-------------------------------------------------
yr_rnd=0 const const const
+ md1 + md2
-------------------------------------------------
yr_rnd=1 const const const
+ yr_rnd + yr_rnd + yr_rnd
+ md1 + md2
+ ymd1 + ymd2
Let's start by looking at how to get the simple effect of yr_rnd when mealcat is 3. Looking at the table above, we can see that we would want to compare const with const + yr_rnd, , which is the same as testing the coefficient for yr_rnd is zero. This is a single parameter test and is shown in the output above. The t-value is -2.85 and the p-value is .0047.
Note that the coefficient for yr_rnd corresponds to the test of the effect of yr_rnd when all other variables are set to 0 (the reference category), i.e. when mealcat is set to the reference category. You may be tempted to interpret the coefficient for yr_rnd as the overall difference between year round schools and non-year round schools, but in this example we see that it really corresponds to the simple effect of yr_rnd. When using dummy coding people commonly misinterpret the lower order effects to refer to overall effects rather than simple effects.
Now let's look at the simple effect of yr_rnd when mealcat=1. Looking at the table above we see that this involves the comparison of the coefficients for yr_rnd=1 vs. yr_rnd=0 when mealcat=1, i.e. comparing const + yr_rnd +md1 + ymd1 vs. const + md1. Removing the terms that drop out we see that to test the simple effect of yr_rnd when mealcat = 1 is the same to test yr_rnd + ymd1 = 0. We will have to do a test statement here following the previous proc reg.
test yr_rnd + ymd1 = 0; run; quit; Test 1 Results for Dependent Variable API00 MeanSource DF Square F Value Pr > FNumerator 1 36536 7.70 0.0058Denominator 394 4743.51314These examples illustrate that it is more complicated to form simple effects when using dummy coding, and also that the interpretation of lower order effects when using dummy coding may not have the meaning that you would expect.
6.8.2 Example 2. Simple effects of mealcat at levels of yr_rnd
Example 1 looked at simple effects for yr_rnd, a variable with only 2 levels and it showed how to use the test statement in SAS for it. In this example, let's consider the simple effects of mealcat at each level of yr_rnd. Because mealcat has more than 2 levels, we will see what is required for doing tests of simple effects for variables with more than 2 levels. We will show both proc glm and proc reg approach here.
proc glm data = elemapi2; class yr_rnd mealcat; model api00 = yr_rnd mealcat yr_rnd*mealcat; contrast '1' mealcat 1 0 -1 yr_rnd*mealcat 1 0 -1 0 0 0, mealcat 0 1 -1 yr_rnd*mealcat 0 1 -1 0 0 0; contrast '2' mealcat 1 0 -1 yr_rnd*mealcat 0 0 0 1 0 -1, mealcat 0 1 -1 yr_rnd*mealcat 0 0 0 0 1 -1; run; quit;
The GLM Procedure<output omitted>
Contrast DF Contrast SS Mean Square F Value Pr > F1 2 3903569.804 1951784.902 411.46 <.00012 2 476157.455 238078.727 50.19 <.0001
Here is how to do it with proc reg. The first test statement below looks at mealcat at yr_rnd = 0 and the second test statement looks at mealcat at yr_rnd = 1.
data reg10; set elemapi2; if yr_rnd = 0 then yrrnd = -.5; if yr_rnd = 1 then yrrnd = .5; if mealcat = 1 then m1 = 2/3; if mealcat = 2 then m1 = -1/3; if mealcat = 3 then m1 = -1/3; if mealcat = 1 then m2 = -1/3; if mealcat = 2 then m2 = 2/3; if mealcat = 3 then m2 = -1/3; if yr_rnd = 0 then my11 = m1; else my11 = 0; if yr_rnd = 0 then my21 = m2; else my21 = 0; if yr_rnd = 1 then my12 = m1; else my12 = 0; if yr_rnd = 1 then my22 = m2; else my22 = 0; run;
proc reg data = reg10; model api00 = yrrnd my11 my21 my12 my22; test my11 = my21 = 0; test my12 = my22 = 0; run; quit;
The REG ProcedureModel: MODEL1Dependent Variable: api00 api 2000 Parameter Estimates Parameter StandardVariable Label DF Estimate Error t Value Pr > |t|Intercept Intercept 1 632.23557 5.80048 109.00 <.0001yrrnd 1 -53.16321 11.60095 -4.58 <.0001my11 1 288.19295 10.44284 27.60 <.0001my21 1 123.78097 10.55185 11.73 <.0001my12 1 247.42857 27.30218 9.06 <.0001my22 1 105.53333 19.59588 5.39 <.0001 Test 1 Results for Dependent Variable api00 MeanSource DF Square F Value Pr > FNumerator 2 1951785 411.46 <.0001Denominator 394 4743.51314 Test 2 Results for Dependent Variable api00 MeanSource DF Square F Value Pr > FNumerator 2 238079 50.19 <.0001Denominator 394 4743.51314
We can also test the simple effects of mealcat at each level of yr_rnd via dummy coding. In SAS, each equal sign in the test statement equals one degree of freedom: because there are two equals signs in the second test statement, it is a two degree-of-freedom test, which is meant to do. The same logic holds true for the fourth test statement and this test is the simple effect of mealcat when yr_rnd=1.
data reg11; set elemapi2; m1 = 0; if mealcat = 1 then m1 = 1; m2 = 0; if mealcat = 2 then m2 = 1; m1y = m1*yr_rnd; m2y = m2*yr_rnd; run;
proc reg data = reg11; model api00 = m1 m2 yr_rnd m1y m2y; test m1 - m2 = 0; test m1 = m2 = 0; test m1 + m1y - m2 - m2y = 0; test m1 + m1y = m2 + m2y = 0; run; quit;
The REG ProcedureModel: MODEL1Dependent Variable: api00 api 2000 Test 1 Results for Dependent Variable api00 MeanSource DF Square F Value Pr > FNumerator 1 1627262 343.05 <.0001Denominator 394 4743.51314 Test 2 Results for Dependent Variable api00 MeanSource DF Square F Value Pr > FNumerator 2 1951785 411.46 <.0001Denominator 394 4743.51314 Test 3 Results for Dependent Variable api00 MeanSource DF Square F Value Pr > FNumerator 1 96095 20.26 <.0001Denominator 394 4743.51314 Test 4 Results for Dependent Variable api00 MeanSource DF Square F Value Pr > FNumerator 2 238079 50.19 <.0001Denominator 394 4743.51314
For more information
The content of this web site should not be construed as an endorsement of any particular web site, book, or software product by the University of California.