Tag: class statement

Understanding the _TYPE_ variable in output data sets

When you use PROC MEANS or PROC SUMMARY to create a summary data set and include a CLASS statement, SAS includes two variables, _FREQ_ and _TYPE_, in the output data set. This blog shows you two ways to interpret and use _TYPE_ using the data set Shoes in the SASHELP […]

Understanding the _TYPE_ variable in output data sets was published on SAS Users.

Using formatted CLASS variables

If you use formatted variables in a CLASS statement in procedures such as MEANS or UNIVARIATE, SAS will use the formatted values of those variables and not the internal values. For example, consider you have a data set (Health) with variables Subj, Age, Weight, and Height. You want to see […]

Using formatted CLASS variables was published on SAS Users.

Example 2014.11: Contrasts the basic way for R

As we discuss in section 6.1.4 of the second edition, R and SAS handle categorical variables and their parameterization in models quite differently. SAS treats them on a procedure-by-procedure basis, which leads to some odd differences in capabilities and default parameterizations. For example, in the logistic procedure, the default is effect cell coding, while in the genmod procedure– which also fits logistic regression– the default is reference cell coding. Meanwhile, many procedures can only accommodate reference cell coding.

In R, in contrast, categorical variables can be designated as “factors” and parameterization stored an attribute of the factor.

In section 6.1.4, we demonstrate how the parameterization of a factor can be easily changed on the fly, in R, in lm(),glm(), and aov, using the contrasts= option in those functions. Here we show how to set the attribute more generally, for use in functions that don’t accept the option. This post was inspired by a question from Julia Kuder, of Brigham and Women’s Hospital.

SAS
We begin by simulating censored survival data as in Example 7.30. We’ll also export the data to use in R.


data simcox;
beta1 = 2;
lambdat = 0.002; *baseline hazard;
lambdac = 0.004; *censoring hazard;
do i = 1 to 10000;
x1 = rantbl(0, .25, .25,.25);
linpred = exp(-beta1*(x1 eq 4));
t = rand("WEIBULL", 1, lambdaT * linpred);
* time of event;
c = rand("WEIBULL", 1, lambdaC);
* time of censoring;
time = min(t, c); * which came first?;
censored = (c lt t);
output;
end;
run;

proc export data=simcox replace
outfile="c:/temp/simcox.csv"
dbms=csv;
run;

Now we’ll fit the data in SAS, using effect coding.


proc phreg data=simcox;
class x1 (param=effect);
model time*censored(0)= x1 ;
run;

We reproduce the rather unexciting results here for comparison with R.


Parameter Standard
Parameter DF Estimate Error

x1 1 1 -0.02698 0.03471
x1 2 1 -0.01211 0.03437
x1 3 1 -0.05940 0.03458

R
In R we read the data in, then use the C() function to assign the contr.sum contrast to a version of the x1 variable that we save as a factor. Once that is done, we can fit the proportional hazards regression with the desired contrast.


simcoxsc2 = transform(simcox, x1.eff = C(as.factor(x1), contr.sum(4)))
effmodel summary(effmodel)

We excerpt the relevant output to demonstrate equivalence with SAS.


coef exp(coef) se(coef)
x1.eff1 -0.02698 0.97339 0.03471
x1.eff2 -0.01211 0.98797 0.03437
x1.eff3 -0.05940 0.94233 0.03458

An unrelated note about aggregators:We love aggregators! Aggregators collect blogs that have similar coverage for the convenience of readers, and for blog authors they offer a way to reach new audiences. SAS and R is aggregated by R-bloggers, PROC-X, and statsblogs with our permission, and by at least 2 other aggregating services which have never contacted us. If you read this on an aggregator that does not credit the blogs it incorporates, please come visit us at SAS and R. We answer comments there and offer direct subscriptions if you like our content. In addition, no one is allowed to profit by this work under our license; if you see advertisements on this page, the aggregator is violating the terms by which we publish our work.->