Tag: data mining

"Entrywise" Norm calculation using PROC FASTCLUS

In some data mining applications, matrix norm has to be calculated, for instance [1]. You can find a detailed explanation of Matrix Norm on Wiki @ Here

Instead of user written routine in DATA STEP, we can obtain “Entrywise” norm via PROC FASTCLUS ef…

Boost to tackle nonlinearity

data nonlinear;
do x=1 to 627;
p=(sin(x/100)+1)*0.45;
do j=1 to 100;
x1=x+(j-1)/100;
if ranuni(8655645)<=p then y=1; else y=0;
output; drop p j;
end;
end;
run;

proc rank data=nonlinear out=nonlinearrank groups=…

K-Nearest Neighbor in SAS

K-Nearest-Neighbor, aka KNN, is a widely used data mining tool and is often called memory-based/case-based/instance-based method as no model is fit. A good introduction to KNN can be find at [1], or @ Wiki.

Typically, KNN algorithm relies on a soph…

Implement Boost Algorithm in SAS

 algorithms are proven to be very effective data mining tools, either used stand alone, or as a building block to handle nonlinearity, etc. Implementation of Boost algorithm in SAS is not easy to find although it is not difficult to wr…

An efficient macro for Stump – two terminal nodes tree

In this post, I post an improved SAS macro of the single partition split algorithm in Chapter 2 of “Pharmaceutical Statistics Using SAS: A Practical Guide” by Alex Dmitrienko, Christy Chuang-Stein, Ralph B. D’Agostino.
The single part…

Run data mining codes following William Potts

FYI: SAS Enterprise Miner and SAS Text Miner Procedures: Reference for SAS 9.1.3, see:
 
http://support.sas.com/documentation/onlinedoc/miner/emtmsas913/listing.html
 
This entry DOES exist in the SAS Support website, but it can’t be found by any search engine or documentation tree view. You’re recommended to download these files immediately due to SAS’s easy-dead hyperlinks.^-^
 
ps.SAS Institute provides no support […]