Kaggle is hosting an educational data mining competition: Kaggle Digit Recognizer, using MNIST data. Handwritten digit recognition is one of the few applications that kNN classifier performs well. Of course, the benchmark kNN classifier provided …
Tag: PROC DISCRIM
KNN Classification and Regression in SAS
kNN stands for k Nearest Neighbor. In data mining and predictive modeling, it refers to a memory-based (or instance-based) algorithm for classification and regression problems. It is a widely used algorithm with many successfully applications in medi…
Regularized Discriminant Analysis
Demo SAS implementation of Regularized (Linear) Discriminate Analysis of J. Friedman (1989)[1]. Simpler introduction can be found at [2]. Regularized QDA follows similarly.
To save coding, I called R within SAS to finish the computation. For details…
Using SAS to find the best k for k-Nearest Neighbor classification
Least-square (regression) and nearest-neighbor are the most fundamental methodologies for supervised classification [Ref. 1]. Even though they are pretty old, they are still popular and widely used in academia and industry. There is a trade-off in comp…
An Economic Approach for a Class of Dimensionality Reduction Techniques
Just back from KDD2010. In the conference, there are several papers that interested me.
On the computation side, Liang Sun et al.’s paper [1], “A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques” caught my eyes. Liang p…
K-Nearest Neighbor in SAS
K-Nearest-Neighbor, aka KNN, is a widely used data mining tool and is often called memory-based/case-based/instance-based method as no model is fit. A good introduction to KNN can be find at [1], or @ Wiki.
Typically, KNN algorithm relies on a soph…