This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
I’ve written about how to use the FILENAME ZIP method to read and update ZIP files in your SAS programs. The ZIP method was added in SAS 9.4, and its advantage is that you can accomplish more in SAS without having to launch external utilities such as WinZip, gunzip, or 7-Zip.
Several readers replied with questions about how you can use the content of these ZIP files within your SAS program. The basic scenario is: “I’ve got some data files in my ZIP archive. I want to use SAS to unzip these and then use them as data within my SAS process. Can I do this?”
Yes, you can — but it does require an extra step. Even though FILENAME ZIP can show you the contents and structure of your ZIP file, most SAS procedures cannot access the content directly while it’s in the archive. So, the additional step is to copy the file to another location, effectively extracting it from the ZIP file.
As an example, I created a ZIP file with two files and a subfolder:
data.zip |__ sas_tech_talks_15.xlsx |__ sas/ |__ instanttitles.sas7bdat
This SAS program helps me to discover how FILENAME ZIP sees the file:
filename inzip ZIP "c:projectsdata.zip"; /* Read the "members" (files) from the ZIP file */ data contents(keep=memname isFolder); length memname $200 isFolder 8; fid=dopen("inzip"); if fid=0 then stop; memcount=dnum(fid); do i=1 to memcount; memname=dread(fid,i); /* check for trailing / in folder name */ isFolder = (first(reverse(trim(memname)))='/'); output; end; rc=dclose(fid); run; /* create a report of the ZIP contents */ title "Files in the ZIP file"; proc print data=contents noobs N; run;
Output:
Files in the ZIP file memname isFolder sas/ 1 sas/instanttitles.sas7bdat 0 sas_tech_talks_15.xlsx 0 N = 3
With this information, I can now “copy” the XLSX file out of the ZIP file and then import it into a SAS data set. Notice how I can use the “member” syntax (fileref with the file I want in parentheses) to address a specific file in the ZIP archive. I want to copy just from the actual files, and not the folder-level entries.
/* identify a temp folder in the WORK directory */ filename xl "%sysfunc(getoption(work))/sas_tech_talks_15.xlsx" ; /* hat tip: "data _null_" on SAS-L */ data _null_; /* using member syntax here */ infile inzip(sas_tech_talks_15.xlsx) lrecl=256 recfm=F length=length eof=eof unbuf; file xl lrecl=256 recfm=N; input; put _infile_ $varying256. length; return; eof: stop; run; proc import datafile=xl dbms=xlsx out=confirmed replace; sheet=confirmed; run;
Sample output from my SAS log:
NOTE: The infile INZIP(sas_tech_talks_15.xlsx) is: Filename=c:projectsdata.zip, Member Name=sas_tech_talks_15.xlsx NOTE: UNBUFFERED is the default with RECFM=N. NOTE: The file XL is: Filename=C:SAS Temporary Files_TD396_Prc2sas_tech_talks_15.xlsx, RECFM=N,LRECL=256,File Size (bytes)=0, Last Modified=11May2015:11:38:59, Create Time=11May2015:11:20:23 NOTE: A total of 55 records were read from the infile library INZIP. NOTE: 55 records were read from the infile INZIP(sas_tech_talks_15.xlsx). NOTE: DATA statement used (Total process time): real time 0.00 seconds cpu time 0.00 seconds
To use the SAS data set in the file, I need to copy it into a location shared by a SAS library. In this example, I will again use the WORK location. Because my SAS data set is in a logical subfolder (named “sas”) within the archive, I need to include that path as part of the member syntax on the INFILE statement.
/* Copy a zipped data set into the WORK library */ filename ds "%sysfunc(getoption(work))/instanttitles.sas7bdat" ; data _null_; /* reference the member name WITH folder path */ infile inzip(sas/instanttitles.sas7bdat) lrecl=256 recfm=F length=length eof=eof unbuf; file ds lrecl=256 recfm=N; input; put _infile_ $varying256. length; return; eof: stop; run; proc contents data=work.instanttitles; run;
Partial output in my example:
Files in the ZIP file The CONTENTS Procedure Data Set Name WORK.INSTANTTITLES Observations 1475 Member Type DATA Variables 6 Engine V9 Indexes 0 Created 01/29/2015 15:09:54 Observation Length 248 Last Modified 01/29/2015 15:09:54 Deleted Observations 0 Protection Compressed NO Data Set Type Sorted NO Label Data Representation WINDOWS_64 Encoding wlatin1 Western (Windows)
Of course, all of this can be automated even further by writing SAS code that automatically iterates through the ZIP file member names and copies/imports each of the members as needed.
The post Using FILENAME ZIP to unzip and read data files in SAS appeared first on The SAS Dummy.
This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |