Encodings of CLASS variables in SAS regression procedures: A cheat sheet
This post was kindly contributed by The DO Loop  go there to comment and to read the full post. 
SAS regression procedures support several parameterizations of classification variables. When a categorical variable is used as an explanatory variable in a regression model, the procedure generates dummy variables that are used to construct a design matrix for the model. The process of forming columns in a design matrix is called a parameterization or encoding. In SAS, most regression procedures use either the GLM encoding, the EFFECT encoding, or the REFERENCE encoding. This article summarizes the default and optional encodings for each regression procedure in SAS/STAT. In many SAS procedures, you can use the PARAM= option to change the default encoding.
The documentation section “Parameterization of Model Effects” provides
a complete list of the encodings in SAS
and shows how the design matrices are constructed from the levels. (The levels are the values of a classification variable.)
Pasta (2005) gives examples and further discussion.
Default and optional encodings for SAS regression procedures
The following SAS regression procedures support the CLASS statement or a similar syntax. The columns GLM, REFERENCE, and EFFECT indicate the three most common encodings. The word “Default” indicates the default encoding. For procedures that support the PARAM= option, the column indicates the supported encodings. The word All means that the procedure supports the complete list of SAS encodings. Most procedures default to using the GLM encoding; the exceptions are highlighted.
Procedure  GLM  REFERENCE  EFFECT  PARAM= 
ADAPTIVEREG  Default  
ANOVA  Default  
BGLIMM  Default  Yes  Yes  GLM  EFFECT  REF 
CATMOD  Default  
FMM  Default  
GAM  Default  
GAMPL  Default  Yes  GLM  REF  
GEE  Default  
GENMOD  Default  Yes  Yes  All 
GLIMMIX  Default  
GLM  Default  
GLMSELECT  Default  Yes  Yes  All 
HP regression procedures  Default  Yes  GLM  REF  
HPMIXED  Default  
ICPHREG  Default  Yes  Yes  All 
LIFEREG  Default  
LOGISTIC  Yes  Yes  Default  All 
MIXED  Default  
ORTHOREG  Default  Yes  Yes  All 
PLS  Default  
PROBIT  Default  
PHREG  Yes  Default  Yes  All 
QUANTLIFE  Default  
QUANTREG  Default  
QUANTSELECT  Default  Yes  Yes  All 
RMTSREG  Default  Yes  Yes  All 
ROBUSTREG  Default  
SURVEYLOGISTIC  Yes  Yes  Default  All 
SURVEYPHREG  Default  Yes  Yes  All 
SURVEYREG  Default  
TRANSREG  Yes  Default  Yes 
A few comments:
 The REFERENCE encoding is the default for PHREG and TRANSREG.
 The EFFECT encoding is the default for CATMOD, LOGISTIC, and SURVEYLOGISTIC.

The HP regression procedures all use the GLM encoding by default and support only PARAM=GLM or PARAM=REF.
The HP regression procedures include
HPFMM,
HPGENSELECT,
HPLMIXED,
HPLOGISTIC,
HPNLMOD,
HPPLS,
HPQUANTSELECT,
and HPREG.
In spite of its name, GAMPL is also an HP procedure. In spite of its name, HPMIXED is NOT an HP procedure!  PROC LOGISTIC and PROC HPLOGISTIC use different default encodings.
 CATMOD does not have a CLASS statement because all variables are assumed to be categorical.
 PROC TRANSREG does not support a CLASS statement. Instead, it uses a CLASS() transformation list. It uses different syntax to support parameter encodings.
How to interpret main effects for the SAS encodings
The GLM parameterization is a singular parameterization.
The other encodings are nonsingular.
The “Other Parameterizations” section of the documentation
gives a simple onesentence summary of how to interpret the parameter estimates for the main effects in each encoding:
 The GLM encoding estimates the difference in the effect of each level compared to the reference level. You can use the REF= option to specify the reference level. By default, the reference level is the last ordered level. The design matrix for the GLM encoding is singular.
 The REFERENCE encoding estimates the difference in the effect of each nonreference level compared to the effect of the reference level. You can use the REF= option to specify the reference level. By default, the reference level is the last ordered level. Notice that the REFERENCE encoding gives the same interpretation as the GLM encoding. The difference is that the design matrix for the REFERENCE encoding excludes the column for the reference level, so the design matrix for the REFERENCE encoding is (usually) nonsingular.
 The EFFECT encoding estimates the difference in the effect of each nonreference level compared to the average effect over all levels.
This article lists the various encodings that are supported for each SAS regression procedures. I hope you will find it to be a useful reference. If I’ve missed your favorite regression procedure, let me know in the comments.
The post Encodings of CLASS variables in SAS regression procedures: A cheat sheet appeared first on The DO Loop.
This post was kindly contributed by The DO Loop  go there to comment and to read the full post. 