algorithms are proven to be very effective data mining tools, either used stand alone, or as a building block to handle nonlinearity, etc. Implementation of Boost algorithm in SAS is not easy to find although it is not difficult to wr…
Tag: predictive modeling
An efficient macro for Stump – two terminal nodes tree
In this post, I post an improved SAS macro of the single partition split algorithm in Chapter 2 of “Pharmaceutical Statistics Using SAS: A Practical Guide” by Alex Dmitrienko, Christy Chuang-Stein, Ralph B. D’Agostino.
The single part…
SAS implementation of Kernel PCA
Kernel method is a very useful technique in data mining that is applicable to any algorithms relying on inner product [1]. The key is applying appropriate kernel function to the inner product of original data space.
I show here SAS/STAT+BASE ex…
Partial Least Square
In some predictive modelling projects, we may have variables that most of the observations have the same value, while the small percentage rest ones are populated with meaningful values. For example, 90% observations have values=0 but the rest 10% ha…
Implementing Gap statistic for clustering number estimation
Gap statistic is a method used to estimate the most possible number of clusters in a partition clustering, noticeablly k-means clustering. This measurement was originated by Trevor Hastie, Robert Tibshirani, and Guenther Walther, all from Standford U…
Tensor in SAS
Tensor, a math concept for high order array, is a very useful tool in modern data mining applications. HOSVD, the counter part of SVD in higher order array, is at the heart of modern applications, such as face recognition and clustering, segmentation…
AUC calculation using Wilcoxon Rank Sum Test
Accurately Calculate AUC (Area Under the Curve) in SAS for a binary classifier rank ordered data
In order to calculate AUC for a given SAS data set that is already rank ordered by a binary classifier (such as linear logistic regression), where we h…