K-Nearest-Neighbor, aka KNN, is a widely used data mining tool and is often called memory-based/case-based/instance-based method as no model is fit. A good introduction to KNN can be find at [1], or @ Wiki.
Typically, KNN algorithm relies on a soph…
K-Nearest-Neighbor, aka KNN, is a widely used data mining tool and is often called memory-based/case-based/instance-based method as no model is fit. A good introduction to KNN can be find at [1], or @ Wiki.
Typically, KNN algorithm relies on a soph…
L1 Regularized Logistic Regression effectively handles large number of predictors and serves variable selection simultaneously. [1] indicates that L1 RLR can be implemented via IRLS-LARS algorithm. You can tweak PROC GLMSELECT in v9.2 for this.
L2 R…
Repeating a line of a data set for each line in another
Suppose you want to access the same information in every line of a data set, and that this information is data-dependent. For example, suppose you want to add the 25th, 50th, and 75th per…
We are working on some changes to support.sas.com that I’ll talk about here over the next few weeks. We tried to squeak a few in before leaving Cary for SAS Global Forum. If you didn’t see these on the demo floor, let me point them out now. Thes…
%macro RScript(Rscript);
data _null_;
file “&Rscript”;
infile cards;
input;
put _infile_;
%mend;
%macro CallR(Rscript, Rlog);
systask command “C:\Progra~1\R\R-2.8.0\bin\R.exe CMD BATCH –vanilla –quiet
…
SAS can allow the strings up to 32,767 characters long but some times SAS will write a Warning message ‘WARNING: The quoted string currently being processed has become more than 262 characters long. You may have unbalanced quotation marks.’, when…
1) How to use SAS to merge base and look-up tables ? pro and con?1. array 2. sort-sort-merge;3. proc sql; 4. proc format; 5. hash objectCoding efficiency: 3>2>4>1>5I/O resource: 5>4>1>3>2flexibility: 3>>others2) What are the common methods for large…
Description of concordant and discordant in SAS PROC LOGISTIC Part of the default output from PROC LOGISTIC is a table that has entries including`percent concordant’ and `percent discordant’. To me, this implies the percent that would corre…