This post was kindly contributed by SAS Programming for Data Mining Applications - go there to comment and to read the full post. |
PROC GLIMMIX is good tool for generalized linear mixed model (GLMM), when the scale is small to medium. When facing a large scale GLMM, such as modeling all ZIPs nested in Counties nested in all 51 States in US, a 64-bit machine with extremely large memory is required and the computing may last for months! In a strictly nested hierarchical model, the variance covariance matrix is very sparse, and taking advantage of this property can accelerate computing many folds.
The %HPGLIMMIX SAS macro is made for large scale Hierarchical Mixed Models. As an example, a sample data using Gamma Regression is shown below, with all ZIPs in AK, AL, AR, AZ with 2-level hierarchies: State and ZIP within State, total 4 blocks with max 693 columns per block. The reason not all ZIPs and all states are used is simply because PROC GLIMMI blow up on the machine.
Copmaring the estimates and std errors from both runs, they are the same, but drastically different running time of 71sec using %HPGLIMMIX v.s. 35min39sec using GLIMMIX.
1715
1716 options nomprint nomlogic;
1717 %hpglimmix(data=temp2,
1718 stmts=%str(
1719 class zip zip_state;
1720 model y = x ;
1721 random int zip/subject=zip_state;
1722 ),
1723 error=gamma,
1724 link=LOG,
1725 options=NOTEST);
NOTEST
The HPGLIMMIX Macro
Data Set : WORK.TEMP2
Error Distribution : GAMMA
Link Function : LOG
Response Variable : Y
Job Starts at : 06JUN2011:15:51:19
HPGLIMMIX Iteration History
Iteration Convergence criterion
1 0.0081058432 13 sec
2 0.0004213646 13 sec
3 2.7137935E-7 13 sec
4 3.1854799E-9 12 sec
Output from final Proc HPMixed run:
Job Ends at : 06JUN2011:15:52:30
1726 options nomprint nomlogic;
1727
1728 proc glimmix data=temp2;
1729 class zip zip_state;
1730 model y = x /s dist=gamma;
1731 random int zip /subject=zip_state;
1732 run;
NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: PROCEDURE GLIMMIX used (Total process time):
real time 35:38.93
cpu time 34:30.90
GLIMMIX output:
Covariance Parameter Estimates
Standard
Cov Parm Subject Estimate Error
Intercept zip_state 0.000152 0.000125
zip zip_state 0.000066 3.105E-6
Residual 0.000405 2.281E-6
Solutions for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
Intercept 6.5873 0.006180 3 1065.95 <.0001
x 0.003436 0.000218 62634 15.79 <.0001
%HPGLIMMIX output:
_cov data from HPGLIMMIX
Obs CovParm Subject Estimate
1 Intercept zip_state 0.000152
2 zip zip_state 0.000066
3 Residual 0.000405
_soln data from HPGLIMMIX
Obs Effect Estimate StdErr
1 Intercept 6.5873 0.006180
2 x 0.003436 0.000218
This post was kindly contributed by SAS Programming for Data Mining Applications - go there to comment and to read the full post. |