Rolling regressions for backtesting

This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post.

Market always generates huge volume time series data with millions of records. Running regressions to obtain the coefficients in a rolling time window is common for many backtesing jobs. In SAS, writing a macro based on the GLM procedures such as PROC REG is not an efficient option. We can imagine the situations: calling PROC REG thousands of times in a big loop would easily petrify any system.

The better way is to go down to the bottom to re-implement the OLS clichés: inverse, transpose and multiply the vectors and matrices. We can do it in either PROC IML, DATA step array or PROC FCMP. For such attempts PROC IML is really powerful but needs extra license. DATA step array would require very high data manipulation skills, since it is not designed for matrix operations. PROC FCMP, a part of SAS/BASE, seems like a portable solution for SAS 9.1 or later. To test this method, I simulated a two-asset portfolio with 100k records, and under a 1000-obs long rolling window, eventually ran 99,001 regressions. The time cost was just 10 seconds on an old laptop. Overall, the speed is quite satisfying.


/* 1 -- Simulate a two-asset portfolio */
data simuds;
   _beta0 = 15; _beta1 = 2; _mse = 5;
   do minute = 1 to 1e5;
     asset1 = ranuni(1234)*10 + 20;
     asset2 = _beta0 + _beta1*asset1 + _mse*rannor(3421);
     output;
   end;
   drop _:; format asset: dollar8.2;
run;

/* 2 -- Decide length of rolling window */
proc sql noprint;
   select count(*) into: nobs
   from simuds
;quit;
%let wsize = 1000;
%let nloop = %eval(&nobs - &wsize + 1);
%put &nloop;

/* 3 -- Manipulate matrices */
proc fcmp;
   /* Allocate spaces for matrices */
    array input[&nobs, 2] / nosym;
   array y[&wsize] / nosym;
   array xone[2, &wsize] / nosym;
   array xonet[&wsize, 2] / nosym;
   array z1[2, 2] / nosym;
   array z2[2, 2] / nosym;
   array z3[2] / nosym;
   array result[&nloop, 3] / nosym;

   /* Input simulation dataset */
   rc1 = read_array('simuds', input, 'asset1', 'asset2');

   /* Calculate OLS regression coefficients */
   do j = 1 to &nloop;
      do i = 1 to &wsize;   
         xone[2, i] = input[i+j-1, 1];
         xone[1, i] = 1;
         y[i] = input[i+j-1, 2];
      end;   
      call transpose(xone, xonet);
      call mult(xone, xonet, z1);
      call inv(z1, z2);
      call mult(z2, xone, xone);
      call mult(xone, y, z3);
      result[j, 1] = z3[1];
      result[j, 2] = z3[2];
      result[j, 3] = j;
   end;

   /* Output resulting matrix as dataset */
   rc2 = write_array('result', result, 'beta0', 'beta1', 'start_time');
   if rc1 + rc2 > 0 then put 'ERROR: I/O error';
   else put 'NOTE: I/O was successful';
 quit;

/* 4 -- Visualize result */
proc sgplot data = result;
   needle x = start_time y = beta1;
   refline 2 / axis = y;
   yaxis min = 1.8;
run;

This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post.