Multi-purpose macro function for getting information about data sets

This post was kindly contributed by SAS Users - go there to comment and to read the full post.

SAS toolbox: macro functions
Did you know you could have a single universal function that can replace all the functions in the world? All those sin(x), log(x), … whatever(x) can all be replaced by a single super function f(x). Don’t believe me? Just make those functions names – sin, log, … whatever to be another argument to that all-purpose function f, just like that: f(x, sin), f(x, log), … f(x, whatever). Now, we only must deal with a single function instead of many, and its second argument will define what transformation needs to be done with the first argument in order to arrive at this almighty function’s value.

How many functions there are in SAS

Last time I counted there were more than 600 SAS functions, and that is excluding call routines and macro functions. But even that huge number grossly under-represents the actual number of functions available in SAS. That is because there are some functions that are built like the universal multi-purpose super function described above. For example, look at the following functions:

finance() function represents several dozen various financial functions;

finfo() function represents multiple functions returning various information items about files (file size, date created, date modified, access permission, etc.);

dinfo() function returns similar information items about directories;

attrn() function returns numeric attributes of a data set (number of observations, number of variables, etc.)

attrc() function returns character attributes of a data set (engine name, encoding name, character set, etc.)

Each of these functions represents not a single function, but a group of functions, and one of their arguments stipulates specific functionality (an information item or an attribute) that is being requested. You can think of this argument as a function modifier.

%sysfunc SAS macro function

%sysfunc() is a super macro function that brings a wealth of SAS functions into SAS macro language. With very few exceptions, most SAS functions are available in SAS macro language thanks to the %sysfunc().

Moreover, we can build our own user-defined macro functions using SAS-supplied macro functions (such as %eval, %length, %quote, %scan, etc.), as well as hundreds of the SAS non-macro functions wrapped into the %sysfunc() super macro function.

Building a super macro function to retrieve information about data sets

Armed with such a powerful arsenal, let’s build a multi-purpose macro function that taps into the data tables’ metadata and extracts various information items about those tables.

Let’s make this macro function return any of the following most frequently used values:

  • Number of observations
  • Number of variables
  • Variables list (positional, separated by spaces)
  • Variables list (positional, separated by commas)

Obviously, we can create much more of these information items and attributes, but here I am just showing how to do this so that you can create your own list depending on your needs.

In my earlier blog post, How to create and use SAS macro functions, we had already built a macro function for getting the number of observations; let’s expand on that.

Here is the SAS Macro code that handles extraction of all four specified metadata items:

%macro dsinfo(dset,info);
/* dset - data set name                             */
/* info - modifier (NOBS, NVARS, VARLIST, VARLISTC) */      
   %local dsid result infocaps i;
   %let infocaps = %upcase(&info);
   %let dsid = %sysfunc(open(&dset));
   %if &dsid %then
   %do;
      %if &infocaps=NOBS %then %let result = %sysfunc(attrn(&dsid,nlobs));
      %else %if &infocaps=NVARS %then %let result = %sysfunc(attrn(&dsid,nvars));
      %else %if &infocaps=VARLIST %then
      %do i=1 %to %sysfunc(attrn(&dsid,nvars));
         %let result = &result %sysfunc(varname(&dsid,&i));
      %end;
      %else %if &infocaps=VARLISTC %then
      %do i=1 %to %sysfunc(attrn(&dsid,nvars));
         %if &i eq 1 %then %let result = %sysfunc(varname(&dsid,&i));
         %else %let result = &result,%sysfunc(varname(&dsid,&i));
      %end;
      %let dsid = %sysfunc(close(&dsid));
   %end;
   %else %put %sysfunc(sysmsg());
   &result
%mend dsinfo;

The SAS log will show:

%put NOBS=***%dsinfo(SASHELP.CARS,NOBS)***;
NOBS=***428***
%put NVARS=***%dsinfo(SASHELP.CARS,NVARS)***;
NVARS=***15***
%put VARLIST=***%dsinfo(SASHELP.CARS,VARLIST)***;
VARLIST=***Make Model Type Origin DriveTrain MSRP Invoice EngineSize Cylinders Horsepower MPG_City MPG_Highway Weight Wheelbase Length***
%put VARLISTC=***%dsinfo(SASHELP.CARS,VARLISTC)***;
VARLISTC=***Make,Model,Type,Origin,DriveTrain,MSRP,Invoice,EngineSize,Cylinders,Horsepower,MPG_City,MPG_Highway,Weight,Wheelbase,Length***

Macro function code highlights

We used the following statement to make our macro function case-insensitive regarding the info argument:

%let infocaps = %upcase(&info);

Then depending on the up-cased second argument of our macro function (modifier) we used the attrn(), varnum() and varname() functions within %sysfunc() to retrieve and construct our result macro variable.

We stick that result macro variable value, &result, right before the %mend statement so that the value is returned to the calling environment.

While info=VARLIST (space-separated variable list) is useful in DATA steps, info=VARLISTC (comma-separated variable list) is useful in PROC SQL.

Usage example

Having this %dsinfo macro function at hands, we can use it in multiple programming scenarios. For example:

/* ending SAS session if no observations to process */
%if %dsinfo(SASHELP.CARS,NOBS)=0 %then %do; endsas; %end;
 
/* further processing */
data MYNEWDATA (keep=%dsinfo(SASHELP.CARS,VARLIST));
   retain %dsinfo(SASHELP.CARS,VARLIST);
   set SASHELP.CARS;
   if _n_=1 then put %dsinfo(SASHELP.CARS,VARLIST);
   /* ... */
run;

Here we first check if there is at least one observation in a data set. If not (0 observations) then we stop the SAS session and don’t do any further processing. Otherwise, when there are some observations to process, we continue.

If SAS code needs multiple calls to the same macro function with the same argument, we can shorten the code by first assigning that macro function’s result to a macro variable and then reference that macro variable instead of repeating macro function invocation. Here is an example:

/* further processing */
%let vlist = %dsinfo(SASHELP.CARS,VARLIST);
data MYNEWDATA (keep=&vlist);
   retain &vlist;
   set SASHELP.CARS;
   if _n_=1 then put &vlist;
   /* ... */
run;

Additional resources

Your thoughts?

Do you see the benefits of these multi-purpose SAS macro functions? Can you suggest other scenarios of their usage? Please share your thoughts in the comments section below.

Multi-purpose macro function for getting information about data sets was published on SAS Users.

This post was kindly contributed by SAS Users - go there to comment and to read the full post.