随笔记录

(44)

proc means data = c:sasreghsb2 mean n;   class race;   var write; run;
The MEANS Procedure  Analysis Variable : write writing score                  N        race    Obs            Mean      N------------------------------------------           1     24      46.4583333     24           2     11      58.0000000     11           3     20      48.2000000     20           4    145      54.0551724    145------------------------------------------
proc glm data = c:sasreghsb2;   class race;   model write = race;   estimate 'level 1 versus level 4' race 1 0 0 -1;   estimate 'level 2 versus level 4' race 0 1 0 -1;   estimate 'level 3 versus level 4' race 0 0 1 -1; run; quit;
The GLM ProcedureDependent Variable: write   writing score                                        Sum ofSource                      DF         Squares     Mean Square    F Value    Pr > FModel                        3      1914.15805       638.05268       7.83    <.0001Error                      196     15964.71695        81.45264Corrected Total            199     17878.87500R-Square     Coeff Var      Root MSE    write Mean0.107063      17.10111      9.025111      52.77500Source                      DF       Type I SS     Mean Square    F Value    Pr > Frace                         3     1914.158046      638.052682       7.83    <.0001Source                      DF     Type III SS     Mean Square    F Value    Pr > Frace                         3     1914.158046      638.052682       7.83    <.0001                                              StandardParameter                     Estimate           Error    t Value    Pr > |t|level 1 versus level 4     -7.59683908      1.98886958      -3.82      0.0002level 2 versus level 4      3.94482759      2.82250377       1.40      0.1638level 3 versus level 4     -5.85517241      2.15275967      -2.72      0.0071
data simple;   set c:sasreghsb2;   if race = 1 then x1 = 3/4; else x1 = -1/4;   if race = 2 then x2 = 3/4; else x2 = -1/4;   if race = 3 then x3 = 3/4; else x3 = -1/4; run;  proc reg data = simple;   model write = x1 x2 x3; run; quit;
The REG ProcedureModel: MODEL1Dependent Variable: write writing score                             Analysis of Variance                                    Sum of           MeanSource                   DF        Squares         Square    F Value    Pr > FModel                     3     1914.15805      638.05268       7.83    <.0001Error                   196          15965       81.45264Corrected Total         199          17879Root MSE              9.02511    R-Square     0.1071Dependent Mean       52.77500    Adj R-Sq     0.0934Coeff Var            17.10111                                 Parameter Estimates                                      Parameter       StandardVariable     Label            DF       Estimate          Error    t Value    Pr > |t|Intercept    Intercept         1       51.67838        0.98212      52.62      <.0001x1                             1       -7.59684        1.98887      -3.82      0.0002x2                             1        3.94483        2.82250       1.40      0.1638x3                             1       -5.85517        2.15276      -2.72      0.0071
proc glm data = c:sasreghsb2;   class race;   model write = race;   estimate 'level 1 versus level 2' race 1 -1 0 0;   estimate 'level 2 versus level 3' race 0 1 -1 0;   estimate 'level 3 versus level 4' race 0 0 1 -1; run; quit;
                                              StandardParameter                     Estimate           Error    t Value    Pr > |t|level 1 versus level 2     -11.5416667      3.28612920      -3.51      0.0006level 2 versus level 3       9.8000000      3.38783369       2.89      0.0043level 3 versus level 4      -5.8551724      2.15275967      -2.72      0.0071
data forward;   set c:sasreghsb2;    if race = 1 then x1 = 3/4; else x1 = -1/4;    if race = 1 or race = 2 then x2 = 1/2;   if race = 3 or race = 4 then x2 = -1/2;    if race = 4 then x3 = -3/4; else x3 = 1/4;  run;  proc reg data = forward;   model write = x1 x2 x3; run; quit;
                                 Parameter Estimates                                      Parameter       StandardVariable     Label            DF       Estimate          Error    t Value    Pr > |t|Intercept    Intercept         1       51.67838        0.98212      52.62      <.0001x1                             1      -11.54167        3.28613      -3.51      0.0006x2                             1        9.80000        3.38783       2.89      0.0043x3                             1       -5.85517        2.15276      -2.72      0.0071
proc glm data = c:sasreghsb2;   class race;   model write = race;   estimate 'level 1 versus level 2' race -1 1 0 0;   estimate 'level 2 versus level 3' race 0 -1 1 0;   estimate 'level 3 versus level 4' race 0 0 -1 1; run; quit;
                                              StandardParameter                     Estimate           Error    t Value    Pr > |t|level 1 versus level 2      11.5416667      3.28612920       3.51      0.0006level 2 versus level 3      -9.8000000      3.38783369      -2.89      0.0043level 3 versus level 4       5.8551724      2.15275967       2.72      0.0071
data backward;   set c:sasreghsb2;    if race = 1 then x1 = -3/4; else x1 = 1/4;    if race = 1 or race = 2 then x2 = -1/2;   if race = 3 or race = 4 then x2 = 1/2;    if race = 4 then x3 = 3/4; else x3 = -1/4;  run;  proc reg data = backward;   model write = x1 x2 x3; run; quit;
                                 Parameter Estimates                                      Parameter       StandardVariable     Label            DF       Estimate          Error    t Value    Pr > |t|Intercept    Intercept         1       51.67838        0.98212      52.62      <.0001x1                             1       11.54167        3.28613       3.51      0.0006x2                             1       -9.80000        3.38783      -2.89      0.0043x3                             1        5.85517        2.15276       2.72      0.0071
proc glm data = c:sasreghsb2;   class race;   model write = race;   estimate 'level 1 versus levels 2, 3 & 4' race 1 -.33333 -.33333 -.33333;   estimate 'level 2 versus levels 3 & 4' race 0 1 -.5 -.5;   estimate 'level 3 versus level 4' race 0 0 1 -1; run; quit;
                                                      StandardParameter                             Estimate           Error    t Value    Pr > |t|level 1 versus levels 2, 3 & 4     -6.96006384      2.17520603      -3.20      0.0016level 2 versus levels 3 & 4         6.87241379      2.92632513       2.35      0.0198level 3 versus level 4             -5.85517241      2.15275967      -2.72      0.0071
data helmert;   set c:sasreghsb2;   if race = 1 then x1 = .75; else x1 = -.25;    if race = 1 then x2 = 0;   if race = 2 then x2 = 2/3;   if race = 3 or race = 4 then x2 = -1/3;    if race = 1 or race = 2 then x3 = 0;   if race = 3 then x3 = 1/2;   if race = 4 then x3 = -1/2;  run;  proc reg data = helmert;   model write = x1 x2 x3; run; quit;
                                 Parameter Estimates                                      Parameter       StandardVariable     Label            DF       Estimate          Error    t Value    Pr > |t|Intercept    Intercept         1       51.67836        0.98212      52.62      <.0001x1                             1       -6.96003        2.17521      -3.20      0.0016x2                             1        6.87241        2.92633       2.35      0.0198x3                             1       -5.85517        2.15276      -2.72      0.0071
proc glm data = c:sasreghsb2;   class race;   model write = race;   estimate 'level 2 versus level1' race -1 1 0 0;   estimate 'level 3 versus levels 1 & 2' race -.5 -.5 1 0;   estimate 'level 4 versus levels 1, 2 & 4' race -.33333 -.33333 -.33333 1; run; quit;
proc glm data = c:sasreghsb2;   class race;   model write = race;   estimate 'level 2 versus level 1' race -1 1 0 0;   estimate 'level 3 versus levels 1 & 2' race -.5 -.5 1 0;   estimate 'level 4 versus levels 1, 2 & 4' race -1 -1 -1 3 / divisor=3; run; quit;
                                                      StandardParameter                             Estimate           Error    t Value    Pr > |t|level 2 versus level1               11.5416667      3.28612920       3.51      0.0006level 3 versus levels 1 & 2         -4.0291667      2.60236299      -1.55      0.1232level 4 versus levels 1, 2 & 4       3.1690296      1.48797250       2.13      0.0344
data diff;   set c:sasreghsb2;   if race = 1 then x1 = -1/2;   if race = 2 then x1 = 1/2;   if race = 3 or race = 4 then x1 = 0;    if race = 1 or race = 2 then x2 = -1/3;   if race = 3 then x2 = 2/3;   if race = 4 then x2 = 0;    if race = 4 then x3 = 3/4; else x3 = -1/4;  run;  proc reg data = diff;   model write = x1 x2 x3; run; quit;
                                 Parameter Estimates                                      Parameter       StandardVariable     Label            DF       Estimate          Error    t Value    Pr > |t|Intercept    Intercept         1       51.67839        0.98212      52.62      <.0001x1                             1       11.54167        3.28613       3.51      0.0006x2                             1       -4.02917        2.60236      -1.55      0.1232x3                             1        3.16905        1.48799       2.13      0.0344
proc glm data = c:sasreghsb2;   class race;   model write = race;   estimate 'level 1 versus levels 2, 3 & 4' race .75 -.25 -.25 -.25;   estimate 'level 2 versus levels 1, 3 & 4' race -.25 .75 -.25 -.25;   estimate 'level 3 versus levels 1, 2 & 4' race -.25 -.25 .75 -.25; run; quit;
                                                      StandardParameter                             Estimate           Error    t Value    Pr > |t|level 1 versus levels 2, 3 & 4     -5.22004310      1.63140849      -3.20      0.0016level 2 versus levels 1, 3 & 4      6.32162356      2.16031394       2.93      0.0038level 3 versus levels 1, 2 & 4     -3.47837644      1.73230472      -2.01      0.0460
data deviation;   set c:sasreghsb2;   if race = 1 then x1 = 1;   if race = 2 or race = 3 then x1 = 0;   if race = 4 then x1 = -1;    if race = 2 then x2 = 1;   if race = 1 or race = 3 then x2 = 0;   if race = 4 then x2 = -1;    if race = 3 then x3 = 1;   if race = 1 or race = 2 then x3 = 0;   if race = 4 then x3 = -1;  run;  proc reg data = deviation;   model write = x1 x2 x3; run; quit;
                                 Parameter Estimates                                      Parameter       StandardVariable     Label            DF       Estimate          Error    t Value    Pr > |t|Intercept    Intercept         1       51.67838        0.98212      52.62      <.0001x1                             1       -5.22004        1.63141      -3.20      0.0016x2                             1        6.32162        2.16031       2.93      0.0038x3                             1       -3.47838        1.73230      -2.01      0.0460
proc glm data = c:sasreghsb2;   class race;   model write = race;   estimate 'linear' race -.671 -.224 .224 .671;   estimate 'quadratic' race .5 -.5 -.5 .5;   estimate 'cubic' race -.224 .671 -.671 .224; run; quit;
                                            StandardParameter                   Estimate           Error    t Value    Pr > |t|linear                    2.90227902      1.53520851       1.89      0.0602quadratic                -2.84324713      1.96424409      -1.45      0.1494cubic                     8.27749195      2.31648010       3.57      0.0004
data poly;   set c:sasreghsb2;   if race = 1 then x1 = -.671;   if race = 2 then x1 = -.224;   if race = 3 then x1 = .224;   if race = 4 then x1 = .671;    if race = 1 then x2 = .5;   if race = 2 then x2 = -.5;   if race = 3 then x2 = -.5;   if race = 4 then x2 = .5;    if race = 1 then x3 = -.224;   if race = 2 then x3 = .671;   if race = 3 then x3 = -.671;   if race = 4 then x3 = .224;  run;  proc reg data = poly;   model write = x1 x2 x3; run; quit;
                                 Parameter Estimates                                      Parameter       StandardVariable     Label            DF       Estimate          Error    t Value    Pr > |t|Intercept    Intercept         1       51.67838        0.98212      52.62      <.0001x1                             1        2.89986        1.53393       1.89      0.0602x2                             1       -2.84325        1.96424      -1.45      0.1494x3                             1        8.27059        2.31455       3.57      0.0004
proc glm data = c:sasreghsb2;   class race;   model write = race;   estimate 'level 1 versus level 3' race 1 0 -1 0;   estimate 'level 2 versus levels 1 & 4' race -.5 1 0 -.5;   estimate 'levels 1 & 2 versus levels 3 & 4' race .5 .5 -.5 -.5; run; quit;
                                                        StandardParameter                               Estimate           Error    t Value    Pr > |t|level 1 versus level 3               -1.74166667      2.73248820      -0.64      0.5246level 2 versus levels 1 & 4           7.74324713      2.89718584       2.67      0.0082levels 1 & 2 versus levels 3 & 4      1.10158046      1.96424409       0.56      0.5756
proc iml;   c = {  1 -.5  .5,           0   1  .5,         -1   0 -.5,          0 -.5 -.5 };   x = c*inv( c`*c );   print x; run; quit;
            X     -0.5        -1       1.5      0.5         1      -0.5     -1.5        -1       1.5      1.5         1      -2.5
data special;   set c:sasreghsb2;   if race = 1 then x1 = -0.5;   if race = 2 then x1 =   .5;   if race = 3 then x1 = -1.5;   if race = 4 then x1 =  1.5;    if race = 1 or race = 3 then x2 = -1;   if race = 2 or race = 4 then x2 =  1;    if race = 1 or race = 3 then x3 = 1.5;   if race = 2 then x3 = -.5;   if race = 4 then x3 =-2.5;  run;  proc reg data = special;   model write = x1 x2 x3; run; quit;
                                 Parameter Estimates                                      Parameter       StandardVariable     Label            DF       Estimate          Error    t Value    Pr > |t|Intercept    Intercept         1       51.67838        0.98212      52.62      <.0001x1                             1       -1.74167        2.73249      -0.64      0.5246x2                             1        7.74325        2.89719       2.67      0.0082x3                             1        1.10158        1.96424       0.56      0.5756

Name of contrast	Comparison made
Simple Coding	Compares each level of a variable to the reference level
Forward Difference Coding	Adjacent levels of a variable (each level minus the next level)
Backward Difference Coding	Adjacent levels of a variable (each level minus the prior level)
Helmert Coding	Compare levels of a variable with the mean of the subsequent levels of the variable
Reverse Helmert Coding	Compares levels of a variable with the mean of the previous levels of the variable
Deviation Coding	Compares deviations from the grand mean
Orthogonal Polynomial Coding	Orthogonal polynomial contrasts
User-Defined Coding	User-defined contrast

Level of race	New variable 1 (c1)	New variable 2 (c2)	New variable 3 (c3)
1 (Hispanic)	1	0	0
2 (Asian)	0	1	0
3 (African American)	0	0	1
4 (white)	-1	-1	-1

Level of race	New variable 1 (x1)	New variable 2 (x2)	New variable 3 (x3)
1 (Hispanic)	(k-1) / k	-1 / k	-1 / k
2 (Asian)	-1 / k	(k-1) / k	-1 / k
3 (African American)	-1 / k	-1 / k	(k-1) / k
4 (white)	-1 / k	-1 / k	-1 / k

Level of race	New variable 1 (c1)	New variable 2 (c2)	New variable 3 (c3)
	Level 1 v. Level 2	Level 2 v. Level 3	Level 3 v. Level 4
1 (Hispanic)	1	0	0
2 (Asian)	-1	1	0
3 (African American)	0	-1	1
4 (white)	0	0	-1

Level of race	New variable 1 (x1)	New variable 2 (x2)	New variable 3 (x3)
	Level 1 v. Level 2	Level 2 v. Level 3	Level 3 v. Level 4
1 (Hispanic)	(k-1)/k	(k-2)/k	(k-3)/k
2 (Asian)	-1/k	(k-2)/k	(k-3)/k
3 (African American)	-1/k	-2/k	(k-3)/k
4 (white)	-1/k	-2/k	-3/k

随笔记录

Chapter 5: Additional coding systems for categorical variables i

Welcome to the Institute for Digital Research and Education

Regression with SAS
Chapter 5: Additional coding systems for categorical variables in regression analysis

IDRE Research Technology Group

High Performance Computing

Statistical Computing

GIS and Visualization

Level of race	Linear (x1)	Quadratic (x2)	Cubic (x3)
1 (Hispanic)	-.671	.5	-.224
2 (Asian)	-.224	-.5	.671
3 (African American)	.224	-.5	-.671
4 (white)	.671	.5	.224

随笔记录

Chapter 5: Additional coding systems for categorical variables i

Welcome to the Institute for Digital Research and Education

Regression with SAS Chapter 5: Additional coding systems for categorical variables in regression analysis

IDRE Research Technology Group

High Performance Computing

Statistical Computing

GIS and Visualization

Regression with SAS
Chapter 5: Additional coding systems for categorical variables in regression analysis