This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post. |
I just recently discovered endless fun to synchronize SAS and R to do something meaningful. Yep, I am a SAS programmer: during the day time, I use SAS for money; at the evening, I use R for fun. It is always exciting to hook up them together. How about a SAS/R module, like SAS/STAT or SAS/BASE, in the future?
Some SAS programmers or SAS ‘developers’ already utilized coding to communicate SAS and R [Ref. 1 and 2] (thanks to Rick Wicklin’s mentioning). Since R can write dataset in SAS code (the ‘foreign’ package) and SAS can use call R in X command to do batch execution, so far I didn’t find much difficulty to use SAS as a GUI for R.
Support vector machine (SVM) is a cool and fancy classification method. It is said that a secret SVM procedure is already running under SAS Enterpriser Miner (I did not get a chance to try it yet). The package ‘e1071’ in R provides the state-of-art protocols for SVM classification. Then I made a small macro to call it in SAS, which performs like a typical SAS procedure. Hope everyone who is doing data could enjoy it.
Reference:
1.Philip R Holland ‘SAS to R to SAS’. Holland Numerics Limited.
2.Phil Rack. ‘A Bridge to R for SAS Users’. www.MineQuest.com
/*******************READ ME********************************************* * - SUPPORT VECTOR MACHINE FOR CLASSIFICATION IN SAS BY R - * * SAS VERSION: SAS 9.1.3 * R VERSION: R 2.13.0 * DATE: 03may2011 * AUTHOR: hchao8@gmail.com * ****************END OF READ ME*********************j********************/ ****************(1) MODULE-BUILDING STEP******************; %macro svm(train = , validate = , result = , targetvar = , tmppath = , rpath = ); /***************************************************************** * MACRO: svm() * GOAL: invoke e1071 in R to perform support vector machine * classification in SAS * PARAMETERS: train = dataset for training * validate = dataset for validation * result = dataset after prediction * targetvar = target variable * tmppath = temporary path for exchagne files * rpath = installation path for R *****************************************************************/ proc export data = &train outfile = "&tmppath\sas2r_train.csv" replace; run; proc export data = &validate outfile = "&tmppath\sas2r_validate.csv" replace; run; proc sql; create table _tmp0 (string char(80)); insert into _tmp0 set string = 'train=read.csv("sas_path/sas2r_train.csv",header=T)' set string = 'validate=read.csv("sas_path/sas2r_validate.csv",header=T)' set string = 'require(e1071,quietly=T)' set string = 'model=svm(sas_targetvar ~ . ,data=train)' set string = 'predicted=predict(model,newdata=validate,type="class")' set string = 'result=as.data.frame(predicted)' set string = 'require(foreign, quietly=T)' set string = 'write.foreign(result,"sas_path/r2sas_tmp.dat",' set string = '"sas_path/r2sas_tmp.sas",package="SAS")'; quit; data _tmp1; set _tmp0; string = tranwrd(string, "sas_targetvar", propcase("&targetvar")); string = tranwrd(string, "sas_path", translate("&tmppath", "/", "\")); run; data _null_; set _tmp1; file "&tmppath\sas_r.r"; put string; run; options xsync xwait; x "cd &rpath"; x "R.exe CMD BATCH --vanilla --slave &tmppath\sas_r.r"; data _null_; infile "&tmppath\sas_r.r.rout"; input; if _n_ = 1 then put "NOTE: Time used by R"; put _infile_; run; %include "&tmppath\r2sas_tmp.sas"; data &result; set &validate; set rdata; run; proc datasets nolist; delete _: rdata; quit; %mend; ****************(2) TESTING STEP******************; ******(2.1) DIVIDE SASHELP.IRIS DATASET INTO TWO EQUAL PARTS*************; %partbyprop2(data = sashelp.iris, targetvar = species, samprate = 0.5, seed = 20110503, train = iris_train, validate = iris_validate); ******(2.2) USE THE SVM MACRO*************; %svm(train = iris_train, validate = iris_validate, result = iris_result, targetvar = species, tmppath = c:\tmp, rpath = c:\Program Files\R\R-2.13.0\bin); ****************(3) VISUALIZATION STEP******************; data iris_visual; set iris_result; length color shape $8.; predvalue = put(predicted, predictd.); if species = "Setosa" then shape = "club"; if species = "Versicolor" then shape = "diamond"; if species = "Virginica" then shape = "spade"; if predvalue = "Setosa" then color = "blue"; if predvalue = "Versicolor" then color = "red"; if predvalue = "Virginica" then color = "green"; run; ods html style = harvest proc g3d data = iris_visual; scatter PetalLength * PetalWidth = SepalLength / color = color shape = shape; run;quit; ods html close; ****************END OF ALL CODING***************************************;
This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post. |