Tag: SVD

Kaggle Digit Recoginizer: SAS k-Nearest Neighbor solution

by Liang Xie • December 11, 2012 • Comments Off

Kaggle is hosting an educational data mining competition: Kaggle Digit Recognizer, using MNIST data. Handwritten digit recognition is one of the few applications that kNN classifier performs well. Of course, the benchmark kNN classifier provided …

KNN Classification and Regression in SAS

by Liang Xie • November 25, 2012 • Comments Off

kNN stands for k Nearest Neighbor. In data mining and predictive modeling, it refers to a memory-based (or instance-based) algorithm for classification and regression problems. It is a widely used algorithm with many successfully applications in medi…

Low Rank Radial Smoothing using GLIMMIX and its Scoring

by L X • September 30, 2010 • Comments Off

Low Rank Radial Smoothing using GLIMMIX [1], a semiparametric approach to smooth curves [2]. Specifying TYPE=RSMOOTH option in RANDOM statement, we can implement this spline smooth approach. The bast thing is that for future scoring, data preparation…

Implement Randomized SVD in SAS

by L X • July 13, 2010 • Comments Off

In the 2010 SASware Ballot®, a dedicated PROC for Randomized SVD was among the options. While an official SAS PROC will not be available in the immediate future as well as in older SAS releases, it is fairly simple to implement this algorithm using …

"Entrywise" Norm calculation using PROC FASTCLUS

by L X • June 26, 2010 • Comments Off

In some data mining applications, matrix norm has to be calculated, for instance [1]. You can find a detailed explanation of Matrix Norm on Wiki @ Here

Instead of user written routine in DATA STEP, we can obtain “Entrywise” norm via PROC FASTCLUS ef…

A Macro for SVD

by L X • March 26, 2010 • Comments Off

SVD is at the heart of many modern machine learning algorithms. As a computing vehicle for PCA, SVD can be obtained using PROC PRINCOMP on the covariance matrix of a given matrix withou correction for intercept. With SVD, we are ready to ca…

Implementing Gap statistic for clustering number estimation

by L X • January 22, 2010 • Comments Off

Gap statistic is a method used to estimate the most possible number of clusters in a partition clustering, noticeablly k-means clustering. This measurement was originated by Trevor Hastie, Robert Tibshirani, and Guenther Walther, all from Standford U…

Clustering Handwirtten Digits (digitalized optical images)

by L X • December 12, 2009 • Comments Off

In this example, we show the a clustering exercise on Optical Recognition of Handwritten Digits Data Set available @ UCI data set repository (Link).

This exercise is a standard application of HOSVD by stacking 8X8 matrix of digitalized&nbs…

Page 1 of 2

1 2 »