Array 2.0: matrix-friendly array in Proc Fcmp

This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post.

Array is probably the only number-indexed data type in SAS. SAS programmers adopt it mainly for multiple-variable batch-processing. For example, longitudinal summation can be achieved by specifying a one-dimensional array and then adding all array elements. On the contrary, general-purpose programming languages widely use array or list to store and index data. The programmers who recently switch into SAS often feel confused about the limitation of SAS’ array, which is transient and need output to the parental data step to save result. Since data step does not memorize the array’s original structure, multiple dimensional arrays are almost useless in creating more complex data structures. In SAS, those multiple dimensional arrays are occasionally implemented to reshape data [Ref. 1]. If a matrix, such as a big one with random numbers of 5000 rows by 5000 columns [Example 1], is needed, 2D array may be intuitively chosen to perform this task. However, the result is not desired. The right way is still to apply a one-dimensional array.

Array in Proc Fcmp is an upgraded array data type other than data step array. This new version of array allows not only creating matrices but also indexing matrix. In addition, the communication between data step and Proc Fcmp is fast and convenient: this procedure supplies the READ_ARRAY() and WRITE_ARRAY() functions, which transform a dataset to an array and vice versa. For example, to test an interviewee’s knowledge on SAS’s data step array, a typical interview question is to ask her/him to write a 9X9 multiplication table [Example 2]. The expected solution is to place the OUTPUT statement between the inner Do-loop and outer Do-loop. Placing OUTPUT into the inner layer of Do-loops builds an 81*9 monster, while ignoring OUPTUT would only generate the last row of multiplication table. This task is much simpler in Proc Fcmp: just produce a matrix and write it to a dataset. Proc Fcmp is a nice alternative to SAS/IML, a specialized matrices language module in SAS. For instance, a SAS/IML kind of job, such as filling missing cells in a matrix (or a dataset) with its diagonal elements, can be fulfilled by arrays in Proc Fcmp [Example 3]. Proc Fcmp is shipped in SAS/BASE, which means that no extra out-of-pocket money is needed for another license.

The matrix feature of Proc Fcmp’s array benefits other SAS programming areas. For example, Proc Transpose and data step array usually reshape data structure from either long to wide or wide to long. However, in many cases that positions between observation and variable in a dataset have to be exchanged, the two methods may require several try-and-error steps. The TRANSPOSE() subroutine in Proc Fcmp solves such a problem easily [Example 4]. Currently there are 13 functions or subroutines available in Proc Fcmp for matrices operation[Ref. 2]. Some may complain they are still not enough for their particular need. Don’t forget: Proc Fcmp makes user-defined functions and subroutines! For example, a deldup_array() function, based on encapsulating Proc SQL in a macro by RUN_MACRO(), can delete duplicate elements in array[Example 5]. Therefore, users of Proc Fcmp’s array can always construct and accumulate their tools to suit their purpose.

Overall, array from Proc Fcmp, which is matrix-friendly, may relieve some pains suffered by those SAS programmers who work in the numerical analysis field.

References: 1. UCLA Statistics Course. http://www.ats.ucla.edu/stat/sas/library/multidimensional_arrays.htm
2. SAS 9.2 Online Help. http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a003193719.htm

******(1) EXAMPLE 1: CREATE A 5000X5000 RANDOM MATRIX****;
****(1.1) WRONG 2D ARRAY SOLUTION******;
data matrix1;
array v[5000, 5000];
do i = 1 to 5000;
do j = 1 to 5000;
v[i, j] = ranuni(0);
end;
output;
end;
run;

****(1.2) CORRECT 1D ARRAY SOLUTION*****;
data matrix2;
array x[5000];
do i = 1 to 5000;
do j = 1 to 5000;
x[j] = ranuni(0);
end;
output;
end;
run;

******(2) EXAMPLE 2: CREATE A MULTIPLICATION TABLE******;
****(2.1) DATA STEP ARRAY SOLUTION ********;
data mt1;
array a[9] a1-a9;
do row = 1 to 9;
do col= 1 to 9;
if row ge col then a[col]=row*col;
end;
output;
end;
drop row col;
run;

****(2.2) PROC FCMP ARRAY SOLUTION*****;
proc fcmp;
array a[9, 9] / nosymbols;
do row =1 to 9;
do col = 1 to 9;
if row ge col then a[row, col] = row*col;
end;
end;
rc1 = write_array('mt2', a);
quit;

****(3) EXAMPLE 3: FILL MISSING CELL WITH DIAGONAL ELEMENT******;
****(3.0) INPUT RAW DATA****;
data have;
input x1-x4;
datalines;
1 . . .
2 1 . .
3 4 1 .
7 6 5 1
;
run;

****(3.1) PROC FCMP TRANSPOSITION******;
proc fcmp;
array a[4, 4] / nosymbols;
rc1 = read_array('have', a);
do i = 1 to 4;
do j = 1 to 4;
if missing(a[i, j]) = 1 then a[i, j] = a[j, i];
end;
end;
rc2 = write_array('want', a);
quit;

*****(4) EXAMPLE 4: RESHAPE SQUARE-SHAPE DATA********;
****(4.0) INPUT RAW DATA********;
data have1;
input x1-x5;
cards;
1 . 0 1 1
0 1 . 0 0
. . 1 1 1
0 0 0 1 .
. 0 0 1 1
;
run;

****(4.1) PROC TRANSPOSE SOLUTION******;
data trps1;
set have1;
obs = _n_;
run;

proc transpose data = trps1 out = trps2;
by obs;
var x1-x5;
run;

proc sort data = trps2 out = trps3;
by _name_;
run;

proc transpose data = trps3 out = want1_1;
by _name_;
var col1;
run;

****(4.2) PROC FCMP SOLUTION********;
proc fcmp;
array a[5, 5] / nosymbols;
rc1 = read_array('have1', a);
array b[5, 5] ;
call transpose(a, b);
rc2 = write_array('want1_2', b);
quit;

******(5) EXAMPLE 5: FCMP DEDUPLICATION FUNCTION FOR FCMP ARRAY*******;
****(5.1) ENCAPSULATE DEDUPLICATIONA UTILITY OF PROC SQL IN A MACRO ******;
%macro deldup_array;
%let dsname = %sysfunc(dequote(&dsname));
%let arrayname = %sysfunc(dequote(&arrayname));
/*(5.1.1) GENERATE UNIQUE DATASET AND ITS OBSERVATION NUMBER*/
proc sql noprint;
select count(unique(&arrayname.1)) into: obs_num
from &dsname;
create table _temp as
select distinct *
from &dsname;
quit;
/*(5.1.2) USE TEMP DATASET TO REPLACE RAW DATASET*/
data &dsname;
set _temp;
run;
/*(5.1.3) DELETE TEMP DATASET*/
proc datasets;
delete _temp;
run;
%mend deldup_array;

****(5.2) ENCAPSULATE MACRO ABOVE IN A FUNCTION*****;
proc fcmp outlib=work.func.practice;
function deldup_array(dsname $, arrayname $);
rc = run_macro('deldup_array', dsname, arrayname, obs_num);
if rc eq 0 then return(obs_num);
else return(.);
endsub;
run;

****(5.3) USE THIS FUNCION TO DELETE DUPLICATES IN ARRAY*****;
option cmplib = (work.func) mlogic mprint symbolgen;
proc fcmp;
array a[1000] /nosymbols;
do j = 1 to 1000;
a[j] = ceil((ranuni(12345)*100)+rannor(12345));
end;

dsname = 'numbers';
rc1 = write_array(dsname, a);

n = deldup_array(dsname, %sysfunc(quote(a)));

call dynamic_array(a, n);
rc2 = read_array(dsname, a);
put a = ;
quit;

*****************END OF ALL CODING****************************;

This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post.