Category: SAS

K-Nearest Neighbor in SAS

K-Nearest-Neighbor, aka KNN, is a widely used data mining tool and is often called memory-based/case-based/instance-based method as no model is fit. A good introduction to KNN can be find at [1], or @ Wiki.

Typically, KNN algorithm relies on a soph…

Next Project: Regularized Logistic Regression

L1 Regularized Logistic Regression effectively handles large number of predictors and serves variable selection simultaneously. [1] indicates that L1 RLR can be implemented via IRLS-LARS algorithm. You can tweak PROC GLMSELECT in v9.2 for this.

L2 R…

Repeating a line of data

Repeating a line of a data set for each line in another
Suppose you want to access the same information in every line of a data set, and that this information is data-dependent.  For example, suppose you want to add the 25th, 50th, and 75th per…

Hey Look: There’s Log In

We are working on some changes to support.sas.com that I’ll talk about here over the next few weeks. We tried to squeak a few in before leaving Cary for SAS Global Forum. If you didn’t see these on the demo floor, let me point them out now. Thes…

Conduct R analysis within SAS

%macro RScript(Rscript);
data _null_;
file “&Rscript”;
infile cards;
input;
put _infile_;
%mend;

%macro CallR(Rscript, Rlog);
systask command “C:\Progra~1\R\R-2.8.0\bin\R.exe CMD BATCH –vanilla –quiet

WARNING: You may have unbalanced quotation marks.

SAS can allow the strings up to 32,767 characters long but some times SAS will write a Warning message ‘WARNING: The quoted string currently being processed has become more than 262 characters long. You may have unbalanced quotation marks.’, when…

How to do data cleaning

1) How to use SAS to merge base and look-up tables ? pro and con?1. array 2. sort-sort-merge;3. proc sql; 4. proc format; 5. hash objectCoding efficiency: 3>2>4>1>5I/O resource: 5>4>1>3>2flexibility: 3>>others2) What are the common methods for large…

PROC LOGISTIC: Concordant and discordant

Description of concordant and discordant in SAS PROC LOGISTIC Part of the default output from PROC LOGISTIC is a table that has entries including`percent concordant’ and `percent discordant’. To me, this implies the percent that would corre…