Putting the squeeze on your SAS data sets

This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post.

Put your data on a dietI’ve known several people who were raised during the Great Depression, and I’ve observed that they are very mindful of waste. My wife’s grandmother used to save plastic bags, twist ties, and relatively clean alumninum foil for potential reuse in the household — because such materials were once scarce. The youth of today, to their credit, are also mindful of waste, but the concern is for the environment and recycling effort, not necessarily material shortages.

In a similar way, those of us who have been in the computer industry for a while can remember when it was critical to scrimp and save for every byte in memory and sector on disk, because storage was scarce in both mediums. But unlike the youth in the real world, the new recruits in the computer industry do not have the same frugality when it comes to use of system resources. Machines have fast-growing capacities for disk space and memory, and not everyone sees the incentive to optimize their use of these resources.

But SAS programmers do. I know this because this SAS note about “shrinking character variables to minimum length required” is popular and highly rated.

I decided to take the popular sample program and extend it into a custom task for SAS Enterprise Guide. The SAS program is a macro that examines each character variable in the data set, measures the length to the longest value within the data, and then adjusts the data set to “shrink” the length of each character variable to just the size that is needed to fit the data. For data sets with lots of observations and grossly overallocated storage, it can result in a signficant reduction in the file size.

I also added an option to compress the data set using the COMPRESS= option, which can reduce the data set file size even further. (Because there is overhead associated with compression, it might not always make the data smaller; in fact, it could make it larger.)

Here is an example report that the task will generate as a result, so you can get an idea of the benefit.
Sample report

The task can work with SAS Enterprise Guide 4.2 or 4.3. To download the task, click here to save a ZIP file with the task and a README file. It’s simple to deploy and use, even if you’re not an administrator on your PC. See the README instructions for details.

If you find the task to be useful, let me know here in the comments section. And if you can suggest changes/improvements, let me know that too.

This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post.