Kaggle is hosting an educational data mining competition: Kaggle Digit Recognizer, using MNIST data. Handwritten digit recognition is one of the few applications that kNN classifier performs well. Of course, the benchmark kNN classifier provided …
Tag: SVD
KNN Classification and Regression in SAS
kNN stands for k Nearest Neighbor. In data mining and predictive modeling, it refers to a memory-based (or instance-based) algorithm for classification and regression problems. It is a widely used algorithm with many successfully applications in medi…
Low Rank Radial Smoothing using GLIMMIX and its Scoring
Low Rank Radial Smoothing using GLIMMIX [1], a semiparametric approach to smooth curves [2]. Specifying TYPE=RSMOOTH option in RANDOM statement, we can implement this spline smooth approach. The bast thing is that for future scoring, data preparation…
Implement Randomized SVD in SAS
In the 2010 SASware Ballot®, a dedicated PROC for Randomized SVD was among the options. While an official SAS PROC will not be available in the immediate future as well as in older SAS releases, it is fairly simple to implement this algorithm using …
"Entrywise" Norm calculation using PROC FASTCLUS
In some data mining applications, matrix norm has to be calculated, for instance [1]. You can find a detailed explanation of Matrix Norm on Wiki @ Here
Instead of user written routine in DATA STEP, we can obtain “Entrywise” norm via PROC FASTCLUS ef…
A Macro for SVD
SVD is at the heart of many modern machine learning algorithms. As a computing vehicle for PCA, SVD can be obtained using PROC PRINCOMP on the covariance matrix of a given matrix withou correction for intercept. With SVD, we are ready to ca…
Implementing Gap statistic for clustering number estimation
Gap statistic is a method used to estimate the most possible number of clusters in a partition clustering, noticeablly k-means clustering. This measurement was originated by Trevor Hastie, Robert Tibshirani, and Guenther Walther, all from Standford U…
Clustering Handwirtten Digits (digitalized optical images)
In this example, we show the a clustering exercise on Optical Recognition of Handwritten Digits Data Set available @ UCI data set repository (Link).
This exercise is a standard application of HOSVD by stacking 8X8 matrix of digitalized&nbs…