Using FILENAME ZIP to unzip and read data files in SAS

This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post.

I’ve written about how to use the FILENAME ZIP method to read and update ZIP files in your SAS programs. The ZIP method was added in SAS 9.4, and its advantage is that you can accomplish more in SAS without having to launch external utilities such as WinZip, gunzip, or 7-Zip.

Several readers replied with questions about how you can use the content of these ZIP files within your SAS program. The basic scenario is: “I’ve got some data files in my ZIP archive. I want to use SAS to unzip these and then use them as data within my SAS process. Can I do this?”

Yes, you can — but it does require an extra step. Even though FILENAME ZIP can show you the contents and structure of your ZIP file, most SAS procedures cannot access the content directly while it’s in the archive. So, the additional step is to copy the file to another location, effectively extracting it from the ZIP file.

As an example, I created a ZIP file with two files and a subfolder:

data.zip
  |__ sas_tech_talks_15.xlsx
  |__ sas/
      |__ instanttitles.sas7bdat

This SAS program helps me to discover how FILENAME ZIP sees the file:

filename inzip ZIP "c:projectsdata.zip";
 
/* Read the "members" (files) from the ZIP file */
data contents(keep=memname isFolder);
 length memname $200 isFolder 8;
 fid=dopen("inzip");
 if fid=0 then
  stop;
 memcount=dnum(fid);
 do i=1 to memcount;
  memname=dread(fid,i);
  /* check for trailing / in folder name */
  isFolder = (first(reverse(trim(memname)))='/');
  output;
 end;
 rc=dclose(fid);
run;
 
/* create a report of the ZIP contents */
title "Files in the ZIP file";
proc print data=contents noobs N;
run;

Output:

        Files in the ZIP file                                         
 memname                       isFolder
 sas/                             1  
 sas/instanttitles.sas7bdat       0  
 sas_tech_talks_15.xlsx           0  
                N = 3

With this information, I can now “copy” the XLSX file out of the ZIP file and then import it into a SAS data set. Notice how I can use the “member” syntax (fileref with the file I want in parentheses) to address a specific file in the ZIP archive. I want to copy just from the actual files, and not the folder-level entries.

/* identify a temp folder in the WORK directory */
filename xl "%sysfunc(getoption(work))/sas_tech_talks_15.xlsx" ;
 
/* hat tip: "data _null_" on SAS-L */
data _null_;
   /* using member syntax here */
   infile inzip(sas_tech_talks_15.xlsx) 
       lrecl=256 recfm=F length=length eof=eof unbuf;
   file   xl lrecl=256 recfm=N;
   input;
   put _infile_ $varying256. length;
   return;
 eof:
   stop;
run;
 
proc import datafile=xl dbms=xlsx out=confirmed replace;
  sheet=confirmed;
run;

Sample output from my SAS log:

NOTE: The infile INZIP(sas_tech_talks_15.xlsx) is:
      Filename=c:projectsdata.zip,
      Member Name=sas_tech_talks_15.xlsx

NOTE: UNBUFFERED is the default with RECFM=N.
NOTE: The file XL is:
      Filename=C:SAS Temporary Files_TD396_Prc2sas_tech_talks_15.xlsx,
      RECFM=N,LRECL=256,File Size (bytes)=0,
      Last Modified=11May2015:11:38:59,
      Create Time=11May2015:11:20:23

NOTE: A total of 55 records were read from the infile library INZIP.
NOTE: 55 records were read from the infile INZIP(sas_tech_talks_15.xlsx).
NOTE: DATA statement used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds

To use the SAS data set in the file, I need to copy it into a location shared by a SAS library. In this example, I will again use the WORK location. Because my SAS data set is in a logical subfolder (named “sas”) within the archive, I need to include that path as part of the member syntax on the INFILE statement.

/* Copy a zipped data set into the WORK library */
filename ds "%sysfunc(getoption(work))/instanttitles.sas7bdat" ;
 
data _null_;
   /* reference the member name WITH folder path */
   infile inzip(sas/instanttitles.sas7bdat) 
	  lrecl=256 recfm=F length=length eof=eof unbuf;
   file   ds lrecl=256 recfm=N;
   input;
   put _infile_ $varying256. length;
   return;
 eof:
   stop;
run;
 
proc contents data=work.instanttitles;
run;

Partial output in my example:

                             Files in the ZIP file                          
                             The CONTENTS Procedure

 Data Set Name        WORK.INSTANTTITLES            Observations          1475
 Member Type          DATA                          Variables             6   
 Engine               V9                            Indexes               0   
 Created              01/29/2015 15:09:54           Observation Length    248 
 Last Modified        01/29/2015 15:09:54           Deleted Observations  0   
 Protection                                         Compressed            NO  
 Data Set Type                                      Sorted                NO  
 Label                                                                        
 Data Representation  WINDOWS_64                                              
 Encoding             wlatin1  Western (Windows)                              

Of course, all of this can be automated even further by writing SAS code that automatically iterates through the ZIP file member names and copies/imports each of the members as needed.

tags: Copy Files, FILENAME ZIP, SAS 9.4, ZIP files

The post Using FILENAME ZIP to unzip and read data files in SAS appeared first on The SAS Dummy.

This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post.