Mixed Feelings about Logistic Regression: Eight Hints for Getting Started with PROC GLIMMIX

This post was kindly contributed by The SAS Training Post - go there to comment and to read the full post.

Delicious Mixed Model Goodness

Imagine the scene: You’re in your favorite coffee shop, laptop and chai. The last of the data from a four-year study are validated and ready for analysis. You’ve explored the plots, preliminary results are promising, and now it is time to fit the model.

It’s not just any model. It’s a three-level multilevel generalized linear mixed model with a binary response. You’ve used GENMOD before. You’ve used MIXED before. Now the two procedures have been sitting in a tree, K-I-S-S-I-N-G, and along comes GLIMMIX in a baby carriage.

We’ve all been there.

Here are some tips for first-time users of PROC GLIMMIX.

The syntax is very similar to PROC MIXED. If you know how to fit models in MIXED, learning GLIMMIX syntax is a snap. Just remember that there is no REPEATED statement in GLIMMIX—R-side random effects are specified through the RANDOM statement with the RESIDUAL keyword or the RESIDUAL option. There are other differences, but the beginning user isn’t likely to encounter them.

Don’t be fooled by how easy the syntax is: you still need to understand how the generalized linear mixed model works to make the best use of it. It’s not just PROC MIXED with DIST= and LINK= options.

If you have random effects and a non-normal response, the default estimation method is pseudo-likelihood. This means you will not get likelihood-based fit statistics to use for model comparisons. You will get tests of the fixed effects and covariance parameter estimates, just not likelihood-based fit statistics. To get maximum likelihood estimation, use the METHOD= option.

If you have a binomial response and random effects, it is best to use maximum likelihood if possible. This is because pseudolikelihood can produce biased variance estimates for the variance components in these cases.

You won’t always be able to / want to use maximum likelihood. Some models cannot be fit with ML estimation, in particular, models with correlated errors. In these cases, it is a good idea to specify EMPIRICAL = MBN in the PROC statement to get empirical sandwich estimates for the standard errors of the fixed effects, with small sample bias correction. This produces standard errors (and therefore p-values and confidence limits) that are robust to the misspecification of the covariance structure. Sometimes maximum likelihood estimation is very slow, and pseudo-likelihood might give you just what you need in less time.

There is an OUTDESIGN = option. It lets you save the Z matrix to an output data set. Maybe I’m a geek, but I think it’s really cool to be able to do that.

The LSMESTIMATE statement constructs customized hypothesis tests (like an estimate statement does) based on combinations of least squares means. If you have always struggled with ESTIMATE statements, the LSMESTIMATE statement might be your new best friend. It doesn’t cover random effects, though. You’ll still need an ESTIMATE statement for that.

You can specify your own likelihood in PROC GLIMMIX through programming statements and automatic variables. It’s a REALLY cool way to fit a model using that new distribution you just added to the exponential family and named after your sainted great-grandmother. Just be careful if you do. While the binomial mean can never theoretically equal zero, it can get close enough that your computer’s processor can’t tell the difference. Try taking the log of that. Your great-grandmother knows you’d better add a fuzz factor.

There are lots of other cool things to know about PROC GLIMMIX, but this will give you something to chew on besides that white chocolate raspberry scone. Be sure to leave a tip for the barista.

This post was kindly contributed by The SAS Training Post - go there to comment and to read the full post.