This post was kindly contributed by SAS and R - go there to comment and to read the full post. |
We begin the new academic year with a series of entries exploring new capabilities of SAS 9.3, and some functionality we haven’t previously written about.
We’ll begin with multiple imputation. Here, SAS has previously been limited to multivariate normal data or to monotonic missing data patterns.
SAS
SAS 9.3 adds the FCS statement to proc mi. This implements a fully conditional specification imputation method (e.g., van Buuren, S. (2007), “Multiple Imputation of Discrete and Continuous Data by Fully Conditional Specification,” Statistical Methods in Medical Research, 16, 219–242.) Briefly, we begin by imputing all the missing data with a simple method. Then missing values for each variable are imputed using a model created with the real and current imputed values for the other variables, iterating across the variables several times.
We replicate the multiple imputation example from the book, section 6.5. In that example, we used the mcmc statement for imputation: at the time, this was the only method available in SAS when a non-monotonic missingness pattern was present. We noted at the time that this was not “strictly appropriate” since mcmc method assumes multivariate normality, and two of our missing variables were dichotomous.
filename myhm url "http://www.math.smith.edu/sasr/datasets/helpmiss.csv" lrecl=704;
proc import replace datafile=myhm out=help dbms=dlm;
delimiter=',';
getnames=yes;
run;
proc mi data = help nimpute=20 out=helpmi20fcs;
class homeless female;
var i1 homeless female sexrisk indtot mcs pcs;
fcs
logistic (female)
logistic (homeless);
run;
In the fcs statement, you list the method (logistic, discrim, reg, regpmm) to be used, naming the variable for which the method is to be used in parentheses following the method. (You can also specify a subset of covariates to be used in the method, using the usual SAS model-building syntax.) Omitted covariates are imputed using the default reg method.
ods output parameterestimates=helpmipefcs
covb = helpmicovbfcs;
proc logistic data=helpmi20fcs descending;
by _imputation_;
model homeless=female i1 sexrisk indtot /covb;
run;
proc mianalyze parms=helpmipefcs covb=helpmicovbfcs;
modeleffects intercept female i1 sexrisk indtot;
run;
with the following primary result:
Parameter Estimate Std Error 95% Conf. Limits
intercept -2.492733 0.591241 -3.65157 -1.33390
female -0.245103 0.244029 -0.72339 0.23319
i1 0.023207 0.005610 0.01221 0.03420
sexrisk 0.058642 0.035803 -0.01153 0.12882
indtot 0.047971 0.015745 0.01711 0.07883
which is quite similar to our previous results. Given the small proportion of missing values, this isn’t very surprising.
R
Several R packages allow imputation for a general pattern of missingness and missing outcome distribution. A brief summary of missing data tools in R can be found in the CRAN Task view on Multivariate Statistics. We’ll return to this topic from the R perspective in a future entry.
This post was kindly contributed by SAS and R - go there to comment and to read the full post. |