Using SAS to access data stored on Dropbox

This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post.

I used “Dropbox” in the title for this post, but these techniques can be used for other cloud-based file sharing services, such as GitHub and Google Drive.

Using PROC HTTP, you can easily access any “cloud-based” file as long as you have a public link to reach it. I’m a cloudy sort of guy, so I’ve got profiles on Dropbox, Google Drive and Github. I’ve shared a few sample data sources on these accounts so that you can try these techniques for yourself.

Here’s one working example to get you started. (In all of my examples, I included a placeholder for PROXYHOST=, which you need only if your Internet connection is routed through a gateway proxy. That’s common for corporate intranet connections, less common for schools and homes.)

filename _inbox "%sysfunc(getoption(work))/streaming.sas7bdat";
 
proc http method="get" 
 url="https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat" 
 out=_inbox
 /* proxyhost="http://yourproxy.company.com" */
;
run;
 
filename _inbox clear;
 
proc contents data=work.streaming;
run;

There are a few additional considerations for accessing cloud-based data.

You need the direct download link for the file


When you click “Share Link” on a service like Dropbox, you are provided with a browser-friendly link that you can give to your friends or colleagues. When they visit that link, they are presented with a simple view of the file name and a Download button.

Here’s the Share Link for my sample data set:

https://www.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat

I can’t use that link in my SAS program because PROC HTTP isn’t going to “click” on the Download button for me. Instead, I need the link that results from clicking on the Download button, which I can get by substituting the dl.dropbox.com domain instead of www.dropbox.com in the address:

https://dl.dropbox.com/s/pgo6ryv8tfjodiv/streaming.sas7bdat

Note: You should assume that URL schemes for the “direct download” links can be subject to change, and of course they vary based on the cloud service provider. Today, the “dl.dropbox.com” scheme works for Dropbox.

A cloud-based folder is NOT a cloud-based path for a SAS library

Dropbox allows you to share a link to a folder that contains multiple files. As tempting as it might be to try, you cannot treat a cloud folder as a “path” for a SAS library. Did you notice the FILENAME statement in my first code example? I’m providing a local file destination for the download operation. PROC HTTP is my method to copy the file that I need into my SAS session.

If I have multiple files to fetch, I’ll need to repeat those steps for each file, using its direct-download URL.

Your file might require additional processing before you can use it

My earlier program example works by copying a SAS7BDAT file (a SAS data set) into my SAS WORK folder. Because it’s a data set, my SAS session recognizes as a valid data member. I was lucky because the data set encoding was compatible with my SAS session.

Sometimes the data might not be quite ready for you to use directly. If a colleague shares a file in a CSV or Excel format, you will need to run a PROC IMPORT step before you can use it. For convenience, a colleague might use PROC CPORT to save multiple SAS data sets into a single transport file. You’ll need to run it through PROC CIMPORT before you can use it.

Here’s an example that uses a CPORT file that I created with the CARS and CLASS sample data sets:

filename _inbox TEMP;
 
proc http method="get" 
 url="https://dl.dropbox.com/s/5uehelsok9oslgv/sample.cpo" 
 out=_inbox
  /* proxyhost="http://yourproxy.company.com" */
;
run;
 
proc cimport file=_inbox lib=work;
run;
filename _inbox clear;

From the SAS log:

NOTE: PROCEDURE HTTP used (Total process time):
      real time           0.91 seconds
      cpu time            0.00 seconds

NOTE: PROC CIMPORT begins to create/update data set WORK.CARS
NOTE: Data set contains 15 variables and 428 observations. 
      Logical record length is 152

NOTE: PROC CIMPORT begins to create/update data set WORK.CLASS
NOTE: Data set contains 5 variables and 19 observations. 
      Logical record length is 40

You can %include SAS programs from the cloud

It might not be the wisest move to blindly access a cloud-based SAS program and execute it with your SAS session, but you can do it. And because SAS programs can contain data records (in the form of CARDS or DATALINES), this is another way to share data in a portable way.

This program downloads and runs Rick Wicklin’s “Christmas Tree” challenge, which he posted on The DO Loop blog last week.

filename _inbox "%sysfunc(getoption(work))/ifsxmas.sas";
proc http method="get" 
 url="https://dl.dropbox.com/s/g1hat0woohud9yc/IFSChristmasTree.sas" 
 out=_inbox
  /* proxyhost="http://yourproxy.company.com" */
;
run;
%include _inbox;
filename _inbox clear;

Working with Google Drive and GitHub

These techniques also work with the other popular services, but the download links are different.

For files stored on GitHub, you need to find the direct link to the “raw” file in your repository. Here’s a working example:

filename _inbox "%sysfunc(getoption(work))/streaming.sas7bdat";
 
proc http method="get" 
 url="https://github.com/cjdinger/proc_http_example/raw/master/data/streaming.sas7bdat" 
 out=_inbox
 /* proxyhost="http://yourproxy.company.com" */
;
run;
 
filename _inbox clear;
 
proc contents data=work.streaming;
run;

For Google Drive, you must first select to make the file or folder public, or at least reachable by “anyone who has the link”. By default, your Google Drive content is private to your account.

The “Share Link” is more cryptic than some of the other services. The important part is the id= portion of the URL, which allows you to form the direct link like this:

https://docs.google.com/uc?export=download&id=<cryptic_id_from_Google_Drive>

Here’s my SAS-enabled example. Note the use of the %str() function to avoid complaints about trying to resolve the ampersand portion of the URL:

filename _inbox "%sysfunc(getoption(work))/streaming.sas7bdat";
 
proc http method="get" 
 url="https://docs.google.com/uc?export=download%str(&)id=0BwSh7LOTCPQ5MmlJZFdOOFJhOHc" 
 out=_inbox 
 /* proxyhost="http://yourproxy.company.com" */
;
run;
 
filename _inbox clear;
 
proc contents data=work.streaming;
run;
tags: cloud, Dropbox, github, Google Drive, PROC HTTP, SAS programming

This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post.