SAS Macro Simplifies SAS and R integration

This post was kindly contributed by SAS and R - go there to comment and to read the full post.

Many of us feel very enthusiastic about R. It’s free, it features cutting edge applications, it has a large community of users contributing for mutual benefit, and on and on. There are also many things to like about SAS, including stability, backwards compatibility, and professional support among them. The way to be the best analyst you can be is to be flexible and have as many tools at your disposal as you can manage. That’s the main motivating principle behind our book and what we do here in this blog.

Today we call attention to a SAS macro that greatly eases integrating R from SAS. Published last month in the Journal of Statistical Software, the macro (written by Xin Wei of Roche Pharmaceuticals) is called Proc_R, and we discuss its installation and use today. For a fuller write-up, see the paper, here. For SAS users, the macro is a huge productivity booster, allowing one to easily complete data management and/or partial data analysis in SAS, skip out quickly to R for analyses that are awkward or impossible in SAS, then return to SAS for completion. For people in industry, this may also ease integrating R into documentation systems built for SAS code. See this post on DecisionStats for a review of other integration attempts.

Getting ready

1. Download the “SAS source code” and the “Replication code and instructions“.

2. Move the macro somewhere you have write access.

3. Open the macro in a text editor and change line 46 so that the rpath option points to the location of your R executable.

(4. If you’re running Windows 7 or Vista, and you has SAS 9.1 or above, follow instructions in a PDF in the second supplemental file you downloaded. This makes a shortcut for a special version of SAS. I’m not at all sure why you have to do this, though. I had the same results running in my usual SAS set-up.)

That’s it! The way the macro works is to read in your R code as a SAS data set, write it out to a file, and call R to run it, then does a bunch of post-processing. The basic macro call looks like this:


%include "C:\ken\sasmacros\Proc_R.sas";
%Proc_R (SAS2R =, R2SAS =);
Cards4;

******************************
***Please Enter R Code Here***
******************************

;;;;
%Quit;

You just replace the starred lines with R code, and run– the R results, if any, appear in your SAS output and/or results windows. The SAS2R value is a list of the names of SAS data sets you’d like to send to R; they’re added into the R environment before your code is executed. The R2SAS value is a list of the names of R objects (that can be coerced to data frames) that you’d like to become SAS data sets.

Use
Here’s a trivial example– generate two data sets in SAS, send them to R to run linear regressions, and send the resulting parameter estimates back to SAS.


data test;
do i = 1 to 1000;
x = normal(0);
y = x + normal(0);
output;
end;
run;

data t2;
do i = 1 to 100;
x = normal(0);
y = x + uniform(0);
output;
end;
run;

%include "C:\Proc_R.sas";
%Proc_R (SAS2R =test t2, R2SAS =mylm mylm2);
Cards4;
an.lm = with(test,lm(y ~x))
mylm = t(coef(an.lm))

an.lm2 = with(t2,lm(y~x))
mylm2 = t(coef(an.lm2))
;;;;
%Quit;

proc print data = mylm; run;
proc print data = mylm2; run;

And here’s what you get in the SAS log.


[First, proc_r result]

******************R OUTPUT***********************

R_OUTPUT_LOG

> setwd("C:... ")
> library(grDevices)
> png("C:...")
> test<- read.csv('C:/Users ... /test.csv')
> t2<- read.csv('C:/Users ... /t2.csv')
> an.lm = with(test,lm(y ~x))
> mylm = t(coef(an.lm))
> summary(an.lm)

Call:
lm(formula = y ~ x)

Residuals:
Min 1Q Median 3Q Max
-2.8571 -0.6430 -0.0051 0.6713 3.5903

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.008568 0.031686 0.27 0.787
x 1.020640 0.033315 30.64 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.002 on 998 degrees of freedom
Multiple R-squared: 0.4846, Adjusted R-squared: 0.4841
F-statistic: 938.5 on 1 and 998 DF, p-value: < 2.2e-16

>
> an.lm2 = with(t2,lm(y~x))
> mylm2 = t(coef(an.lm2))
> write.csv(mylm,'mylm.csv',row.names=F)
> write.csv(mylm2,'mylm2.csv',row.names=F)
> dev.off()
null device
1
> q()
> proc.time()
user system elapsed
0.28 0.10 0.37


[Here are the proc print results]
Obs _Intercept_ x
1 0.0085676126 1.0206400545

Obs _Intercept_ x
1 0.528410053 0.9851225238

(Page breaks and some extraneous stuff removed.)

It’s pretty magical for a SAS user to see R living in the SAS output like this. But there are some caveats. First, this is a windows-only macro. If you run SAS on *nix, you may not be able to get it to work. Second, while the article has examples of graphics from R neatly appearing in SAS, this failed for me. This may be due to the fact that I run SAS 9.3, while the author of the macro is still in earlier versions of SAS. I may try to diagnose and fix this problem, and will update this entry if I find a fix.

However, these seem like minor problems, compared with the overall simplification offered by the macro. It’s been in great use to me in the past few months, and I expect it will help others as well. Many thanks and congratulations to Xin Wei!

This post was kindly contributed by SAS and R - go there to comment and to read the full post.