Reading and updating ZIP files with FILENAME ZIP

This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post.

In a previous post, I shared an example of using ODS PACKAGE to create ZIP files. But what if you need to read a ZIP file within your SAS program? In SAS 9.4, you can use the FILENAME ZIP access method to do the job.

In this example, let’s pretend that I need to analyze data that a government agency published (maybe by using SAS!) into a ZIP file. I’ve selected an exciting data source (found via data.gov) about Large Truck Crash Causation.

First, I need to download the latest version of the data file. I’ll use PROC HTTP to do that job:

/* detect proper delim for UNIX vs. Windows */
%let delim=%sysfunc(ifc(%eval(&sysscp. = WIN),\,/));
 
/* create a name for our downloaded ZIP */
%let ziploc = %sysfunc(getoption(work))&delim.datafile.zip;
filename download "&ziploc";
 
/* Download the ZIP file from the Internet*/
proc http
 method='GET'
 url="http://ai.fmcsa.dot.gov/ltccs/Data/TEXT/Public/LTCCS_db_txt_public_01.zip"
 out=download;
run;

Next, I need to discover what files are within the ZIP file. I’ll assign a fileref using the new FILENAME ZIP method. FILENAME ZIP is a directory-based access method, similar to the CATALOG access method or to using FILENAME to map to a folder. You can use functions such as DOPEN and DREAD to treat the ZIP file as if it’s a file directory (since that’s what it is, in concept).

/* Assign a fileref wth the ZIP method */
filename inzip zip "&ziploc";
 
/* Read the "members" (files) from the ZIP file */
data contents(keep=memname);
 length memname $200;
 fid=dopen("inzip");
 if fid=0 then
  stop;
 memcount=dnum(fid);
 do i=1 to memcount;
  memname=dread(fid,i);
  output;
 end;
 rc=dclose(fid);
run;
 
/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=contents noobs N;
run;

Here’s the report of files within the ZIP archive:


I’ve identified the HAZMAT.TXT file as the one that I want to analyze. I peeked at the first couple of records and was able to scratch out a simple DATA step to read the data. Notice how I don’t need to explicitly extract the HAZMAT.TXT file — I can simply reference it as a “member” of the INZIP fileref. The ZIP access method does the rest.

/* Import a text file directly from the ZIP */
data hazmat;
 infile inzip(hazmat.txt) 
   firstobs=2 dsd dlm='09'x;
 input 
  CaseID $10.
  VehicleNumber 
  Material 
  Reportable 
  Waiver 	
  PSU	 
  PSUStrata	
  RATWeight;
run;
 
title "Box plot of Vehicles # per incident";
ods graphics / height=200 width=450;
proc sgplot data=hazmat;
	hbox vehiclenumber;
	label VehicleNumber="# of vehicles";
	xaxis labelattrs=(size=12) valueattrs=(size=12);
run;

SAS reads my data file successfully, and yields this interesting box plot from the SGPLOT step:


(It looks like most “hazardous materials” accidents involved just 2 or 3 vehicles, except for one messy outlier that had nearly 30. Imagine the cleanup effort on that one!)

As an alternative, if I know exactly which file I need, I can assign a direct fileref by using the MEMBER= syntax:

filename inzip zip "&ziploc" member="hazmat.txt";
 
/* then my INFILE references the file directly, no parenthesized-member */
data hazmat;
 infile inzip
   firstobs=2 dsd dlm='09'x;
/* ...  */

The ZIP access method isn’t just for reading. I can also use it to create and update ZIP files. For creating ZIP files, I prefer to use ODS PACKAGE. But it’s very handy to be able to update ZIP files from a SAS program without using an external tool. For example, here’s a program that deletes an extraneous file from an existing ZIP file:

/* Remove the PackageMetadata piece that ODS PACKAGE creates */
filename pkg ZIP "c:\projects\filenamezip\new.zip" member="PackageMetaData";
data _null_;
 if (fexist('pkg')) then 
  rc = fdelete('pkg');
run;

Note: Like ODS PACKAGE, the FILENAME ZIP method does not support encrypted (password-protected) ZIP archives.

Download the complete SAS 9.4 program: filenameZipHttpExample.sas

Thanks to the growing size of data files, ZIP files are created and consumed by SAS users everywhere. Between ODS PACKAGE and FILENAME ZIP, you can teach your SAS programs to build and read the files without having to rely on external tools. The more you that you can use native SAS methods for this work, the more portable your SAS programs will be.

tags: FILENAME ZIP, PROC HTTP, SAS 9.4, ZIP files

This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post.