Tag: PROC DISCRIM

Kaggle Digit Recoginizer: SAS k-Nearest Neighbor solution

Kaggle is hosting an educational data mining competition: Kaggle Digit Recognizer, using MNIST data. Handwritten digit recognition is one of the few applications that kNN classifier performs well. Of course, the benchmark kNN classifier provided …

KNN Classification and Regression in SAS

kNN stands for k Nearest Neighbor. In data mining and predictive modeling, it refers to a memory-based (or instance-based) algorithm for classification and regression problems. It is a widely used algorithm with many successfully applications in medi…

Regularized Discriminant Analysis

Demo SAS implementation of Regularized (Linear) Discriminate Analysis of J. Friedman (1989)[1]. Simpler introduction can be found at [2]. Regularized QDA follows similarly.

To save coding, I called R within SAS to finish the computation. For details…

Using SAS to find the best k for k-Nearest Neighbor classification

Least-square (regression) and nearest-neighbor are the most fundamental methodologies for supervised classification [Ref. 1]. Even though they are pretty old, they are still popular and widely used in academia and industry. There is a trade-off in comp…

An Economic Approach for a Class of Dimensionality Reduction Techniques

Just back from KDD2010. In the conference, there are several papers that interested me.

On the computation side, Liang Sun et al.’s paper [1], “A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques” caught my eyes. Liang p…

K-Nearest Neighbor in SAS

K-Nearest-Neighbor, aka KNN, is a widely used data mining tool and is often called memory-based/case-based/instance-based method as no model is fit. A good introduction to KNN can be find at [1], or @ Wiki.

Typically, KNN algorithm relies on a soph…