Hash a SAS Value

This post was kindly contributed by SAS from Out in Left Field - go there to comment and to read the full post.

Sometimes, it is good to be able to hash a value so that a unique key can be made into the data. For example, say you were looking at a system performance log. You have a PID, a process name, and a user. PIDs are reused by a system all of the time so trying to narrow down uniqueness throughout a day is hard.

It order to get a unique value, you could concatenate the values into one:

000789654 || WeeklyProcess || gertre5

We are assuming that there is no need to ever reverse the values. This is a key assumption.

There is an undocumented function in SAS called CRCXX1 that can create a unqiue hash. Here is some code illustrating it:

data A;
input name :$200. gender :$8. state :$20.;
x = compress(name||gender||state);
y = CRCXX1(x);
put x= y=32. ;
datalines;
Churchill,Alan Male Colorado
Churchill,John Male Colorado
;
run;

The results:

data A;
884 data A;
885 input name :$200. gender :$8. state :$20.;
886 x = compress(name||gender||state);
887 y = CRCXX1(x);
888 put x= y=32. ;
889 datalines;

x=Churchill,AlanMaleColorado y=1558070123
x=Churchill,JohnMaleColorado y=837584169
NOTE: The data set WORK.A has 2 observations and 5 variables.
NOTE: DATA statement used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds


892 ;
893 run;

This could be very valuable for situations where you need to tighten up processing and have some throwaway field values. The person who mentioned the undocumented function says it is good to about 1 million unique values before it starts to have collisions. Above that, go with the MD5 function.

This post was kindly contributed by SAS from Out in Left Field - go there to comment and to read the full post.