%HPGLIMMIX macro on large scale HMM

This post was kindly contributed by SAS Programming for Data Mining Applications - go there to comment and to read the full post.


PROC GLIMMIX is good tool for generalized linear mixed model (GLMM), when the scale is small to medium. When facing a large scale GLMM, such as modeling all ZIPs nested in Counties nested in all 51 States in US, a 64-bit machine with extremely large memory is required and the computing may last for months! In a strictly nested hierarchical model, the variance covariance matrix is very sparse, and taking advantage of this property can accelerate computing many folds.

The %HPGLIMMIX SAS macro is made for large scale Hierarchical Mixed Models. As an example, a sample data using Gamma Regression is shown below, with all ZIPs in AK, AL, AR, AZ with 2-level hierarchies: State and ZIP within State, total 4 blocks with max 693 columns per block. The reason not all ZIPs and all states are used is simply because PROC GLIMMI blow up on the machine.

Copmaring the estimates and std errors from both runs, they are the same, but drastically different running time of 71sec using %HPGLIMMIX v.s. 35min39sec using GLIMMIX.



1715
1716  options nomprint nomlogic;
1717  %hpglimmix(data=temp2,
1718             stmts=%str(
1719                    class zip  zip_state;
1720                    model y = x ;
1721                    random int zip/subject=zip_state;
1722                      ),
1723             error=gamma,
1724             link=LOG,
1725             options=NOTEST);
NOTEST

       The HPGLIMMIX Macro

Data Set           : WORK.TEMP2
Error Distribution : GAMMA
Link Function      : LOG
Response Variable  : Y


Job Starts at : 06JUN2011:15:51:19
    HPGLIMMIX Iteration History

Iteration    Convergence criterion
    1            0.0081058432  13 sec
    2            0.0004213646  13 sec
    3            2.7137935E-7  13 sec
    4            3.1854799E-9  12 sec

Output from final Proc HPMixed run:
Job Ends at : 06JUN2011:15:52:30
1726  options nomprint nomlogic;
1727
1728  proc glimmix data=temp2;
1729       class zip  zip_state;
1730       model y = x /s  dist=gamma;
1731       random int zip /subject=zip_state;
1732  run;



NOTE: Convergence criterion (PCONV=1.11022E-8) satisfied.
NOTE: PROCEDURE GLIMMIX used (Total process time):
      real time           35:38.93
      cpu time            34:30.90

GLIMMIX output:


              Covariance Parameter Estimates

                                                  Standard
             Cov Parm     Subject      Estimate     Error

             Intercept    zip_state    0.000152    0.000125
             zip          zip_state    0.000066    3.105E-6
             Residual                  0.000405    2.281E-6

             Solutions for Fixed Effects

                                Standard
     Effect       Estimate       Error       DF    t Value    Pr > |t|

    Intercept      6.5873    0.006180        3    1065.95      <.0001
    x              0.003436    0.000218    62634      15.79      <.0001

%HPGLIMMIX output:


                _cov data from  HPGLIMMIX

               Obs    CovParm       Subject     Estimate

                1     Intercept    zip_state    0.000152
                2     zip          zip_state    0.000066
                3     Residual                  0.000405

               _soln data from HPGLIMMIX

               Obs    Effect       Estimate      StdErr

                1     Intercept      6.5873      0.006180
                2     x              0.003436    0.000218

This post was kindly contributed by SAS Programming for Data Mining Applications - go there to comment and to read the full post.