Kaggle is hosting an educational data mining competition: Kaggle Digit Recognizer, using MNIST data. Handwritten digit recognition is one of the few applications that kNN classifier performs well. Of course, the benchmark kNN classifier provided ...

Read more »

Tags: data mining, KNN, predictive modeling, PROC DISCRIM, SVD

Posted in SAS | Comments Off on Kaggle Digit Recoginizer: SAS k-Nearest Neighbor solution

kNN stands for k Nearest Neighbor. In data mining and predictive modeling, it refers to a memory-based (or instance-based) algorithm for classification and regression problems. It is a widely used algorithm with many successfully applications in medi...

Read more »

Tags: data mining, KNN, Nearest Neighbor, PCA, predictive modeling, PROC DISCRIM, PROC KRIGE2D, SVD

Posted in SAS | Comments Off on KNN Classification and Regression in SAS

Low Rank Radial Smoothing using GLIMMIX , a semiparametric approach to smooth curves . Specifying TYPE=RSMOOTH option in RANDOM statement, we can implement this spline smooth approach. The bast thing is that for future scoring, data preparation...

Read more »

Tags: predictive modeling, PROC GLIMMIX, PROC PRINCOMP, PROC SCORE, SVD

Posted in SAS | Comments Off on Low Rank Radial Smoothing using GLIMMIX and its Scoring

In the 2010 SASware Ballot®, a dedicated PROC for Randomized SVD was among the options. While an official SAS PROC will not be available in the immediate future as well as in older SAS releases, it is fairly simple to implement this algorithm using ...

Read more »

Tags: predictive modeling, PROC FASTCLUS, PROC PRINCOMP, PROC SCORE, SVD

Posted in SAS | Comments Off on Implement Randomized SVD in SAS

In some data mining applications, matrix norm has to be calculated, for instance . You can find a detailed explanation of Matrix Norm on Wiki @ Here
Instead of user written routine in DATA STEP, we can obtain "Entrywise" norm via PROC FASTCLUS ef...

Read more »

Tags: data mining, PROC FASTCLUS, SVD

Posted in SAS | Comments Off on "Entrywise" Norm calculation using PROC FASTCLUS

SVD is at the heart of many modern machine learning algorithms. As a computing vehicle for PCA, SVD can be obtained using PROC PRINCOMP on the covariance matrix of a given matrix withou correction for intercept. With SVD, we are ready to ca...

Read more »

Tags: PROC PRINCOMP, SVD

Posted in SAS | Comments Off on A Macro for SVD

Gap statistic is a method used to estimate the most possible number of clusters in a partition clustering, noticeablly k-means clustering. This measurement was originated by Trevor Hastie, Robert Tibshirani, and Guenther Walther, all from Standford U...

Read more »

Tags: array, Gap Statistic, K-means Clustering, predictive modeling, SVD

Posted in SAS | Comments Off on Implementing Gap statistic for clustering number estimation

In this example, we show the a clustering exercise on Optical Recognition of Handwritten Digits Data Set available @ UCI data set repository (Link).
This exercise is a standard application of HOSVD by stacking 8X8 matrix of digitalized&nbs...

Read more »

Tags: HOSVD, K-means Clustering, PROC PRINCOMP, SVD, Tensor

Posted in SAS | Comments Off on Clustering Handwirtten Digits (digitalized optical images)