A two-step transpose approach to reshape data

This post was kindly contributed by SAS Analysis - go there to comment and to read the full post.

New in SAS 9.2, the TRANSPOSE procedure accepts multiple IDs in its ID statement. More than one IDs would automatically concatenate together as the new variable names. Previously, Proc Transpose usually only allows one ID. As the result, the concatenation of variable names has to be done by DATA step array in SAS 9.1 or earlier versions. This change would bring more flexibility to reshape data to any desired structure. In this case, a small file with date, gender and 3 credit records is transformed to a more flat data structure, only corresponding to the date. Gender would be moved from row name to column name, and consequently several new variables would be created to combine old variables: credit and gender.

A variety of methods in SAS can realize the reshaping purpose. The coding can follow the principle: first accumulate the numeric values in a single column; second expand them with its accompanying IDs as new variable names. As usual, Proc SQL is always the first choice to aggregate data by its Group By statement. In this example, splitting and combining did the trick but needs some more coding. To increase efficiency, building a macro may be useful. If the programming intention is to report, Proc Report has the magic power to display sub-categories with least code. However, variable names have to be re-defined in the following steps.

Two-step transpose by Proc Transpose is intuitive to change data structure horizontally. And it’s pretty extensible to even more complex data structure.

*******(0) Input the raw data***********;
data raw;
format date mmddyy10.;
input sex: $1. Date: mmddyy10. Credit1 Credit2 Credit3;
cards;
M 01/01/2011 600 610 650
M 01/02/2011 500 510 730
F 01/01/2011 700 710 820
F 01/02/2011 400 410 500
;
run;

*********(1)Two-step transpose***************;
******NOTE: only works in SAS 9.2*********;
proc sort data=raw out=raw_s; by date sex;run;
proc transpose data=raw_s out=raw_t;
var credit:;
by date sex;
run;
proc sort data=raw_t out=interim; by date; run;
proc transpose data=interim out=final1(drop=_name_);
var col1;
by Date;
id sex _name_ ;
run;

********(2)Data step array: an alternative for two-step tranpose******;
data arrout;
set raw;
array cr[*] credit:;
do i=1 to dim(cr);
cred=cr[i];
cred_name=cats(sex,vname(cr[i]));
output;
end;
keep date cred cred_name;
run;
proc sort data=arrout out=arrout_s; by date;run;
/*NOTE: a following transpose would be better than another data step array*/
proc transpose data=arrout_s out=final2(drop=_name_);
by date;
var cred;
id cred_name;
run;

*********(3)Proc SQL with Macro********;
%macro reshape(max);
%do i=1 %to &max;
proc sql;
create table out&i as
select a.date, a.credit&i as Mcredit&i , b.credit&i as Fcredit&i
from raw(where=(sex='M')) as a , raw(where=(sex='F')) as b
where a.date=b.date
;quit;
%end;
data final3;
%do j=1 %to &max;
set out&j;
%end;
run;
%mend reshape;
%reshape(3);

******(4)Proc report*********;
proc report data=raw nowd out=final4 ;
column date sex,(credit:);
define date/group;
define sex/across;
run;

*********End of the program****Tested on 21Jan2011********;

This post was kindly contributed by SAS Analysis - go there to comment and to read the full post.