SAS Data Driven Programming: My 4 Favorite Techniques

This post was kindly contributed by From a Logical Point of View » SAS - go there to comment and to read the full post.

I use relatively fixed patterns in my SAS  programming life. For so called data driven programming (or dynamic programming), I used the following 4 techniques, chronologically:

  • macro array
  • call execute
  • list processing
  • for each loop

For a quick demo, I will start with a simple scenario in which the data set sashelp.zipcode should be spitted to pieces of datasets by states (in real projects, the codes would be more complicated but share the simple atom structure). For example, the dataset for North Carolina:

data zipcode_NC;
    set sashelp.zipcode(where=(statecode="NC"));
run;

There are 50 states plus territories like Puerto Rico in the source data, so you won’t just use a simple string replacement by macro variable. My first SAS dynamic programming technique is using macro array, learned from Chapter 6 of Carpenter’s Complete Guide to the SAS Macro Language:

1. Macro array

proc sort data=sashelp.zipcode (keep=statecode) nodupkey out=statecode;
    by statecode;
run;

data _null_;
    set statecode end=eof;
    i+1;
    II=left(put(i,2.));
    call symputx(‘statecode’||II,statecode);
    if eof then call symputx(‘n’,II);
run;

%macro doit;
%do i=1 %to &n;
    data class_&&statecode&i;
        set sashelp.zipcode(where=(statecode="&&statecode&i"));
    run;
%end;
%mend;
%doit

Macro array is still my favorite and I use it everywhere. It creates multiple macros variables by sequence (macro array) from the control file, then apply a do loop over each macro variable to get job done: straightforward while robust(verbose somehow).

2. Call Execute

Call Execute is a power tool. It will make your codes much concise and efficient. I love it but with little bit of reluctance: regardless the potential timing issue, for me, it is not aesthetically readable in most cases. To make a relatively enjoyable Call Execute, I prefer to enclose the atom part of the program to a macro then use a single Call Execute to resolve it:

%macro break(statecode);
   data class_&statecode;
      set sashelp.zipcode(where=(statecode="&statecode"));
   run;
%mend;

data _null_;
  set statecode;
  call execute(‘%break(‘||trim(statecode)||’)’);
run;

Heavily interacting with data step will just make Call Execute like black magic (this snippet comes from Mike Molter’s 2013 paper, Coding For the Long Haul With Managed Metadata and Process Parameters; sorry Mike, I know it works):

data _null_ ;
length lastds $4 ;
set meta2 end=thatsit ;
if _n_ eq 1 then do;
     call execute (‘proc sql; ‘) ;
     lastds=’ae’;
end;
retain lastds ;
 
call execute (‘create table ‘||data set||’ as select a.*,’||left(keepvars)||’ from
‘||lastds a left join suppae_tran(where=(idvar eq "’||compress(idvar)||’")) b on
a.usubjid=b.usubjid and a.’||compress(idvar)||’=’||left(joincond)||’;’);

 
if thatsit then call execute(‘ quit; ‘) ;
lastds=data set ;
run;

Well there are always trade-offs. Separating Call Execute with data steps (my preference) will make it much more readable, but it is not cool anymore(compared to Mike’s style)! Coolness deserves the efforts and I know it’s part of programmers proud.

3. List Processing

I started to use list in SAS since 2011 and now I have a big collection:

https://github.com/Jiangtang/SAS_ListProcessing

Macro array approach will create series of macro variables, while in list method, a single macro variable will be generated which hold series of values:

proc sql noprint;
    select statecode into:statecode separated by " "
    from statecode
    ;
quit;

%macro doit;
%do i=1 %to %sysfunc(countw(&statecode));
    %let a=%sysfunc(scan(&statecode,&i,’ ‘));
    data class_&a;
        set sashelp.zipcode(where=(statecode="&a"));
    run;
%end;
%mend;
%doit

4. For each

A foreach operation is dream for SAS programmers. Now I have one, for.sas by Jim Anderson:

filename list url "https://raw.github.com/Jiangtang/SAS_ListProcessing/master/_ListProcessing";
%inc list;

data %for(statecode, in=[statecode], do=%nrstr(class_&statecode(where=(statecode="&statecode"))));
    set sashelp.zipcode;
run;

Actually this %for is much more versatile than it appears in this simple demo. It can proceeds sequentially against SAS datasets, value list, number range, along with dataset contents and directory contents. Check it out and you will definitely love it:

http://www.sascommunity.org/wiki/Streamlining_Data-Driven_SAS_With_The_%25FOR_Macro

This post was kindly contributed by From a Logical Point of View » SAS - go there to comment and to read the full post.