Random Sample Selection

Last week my manager asked me to randomly pick 10%observations from a large data set and then create a listing so that the Data management programmers can QC the data. I want to share some thoughts here … how easy and simple to do random sampling. …

It’s the added value that counts

Welcome to SAS Training Post, the official blog of SAS Training & Certification! My name is Michele Reister and I am the social media manager for SAS Education. This blog will be a channel to provide you with value-add educational content t…

K-Nearest Neighbor in SAS

K-Nearest-Neighbor, aka KNN, is a widely used data mining tool and is often called memory-based/case-based/instance-based method as no model is fit. A good introduction to KNN can be find at [1], or @ Wiki.

Typically, KNN algorithm relies on a soph…

Next Project: Regularized Logistic Regression

L1 Regularized Logistic Regression effectively handles large number of predictors and serves variable selection simultaneously. [1] indicates that L1 RLR can be implemented via IRLS-LARS algorithm. You can tweak PROC GLMSELECT in v9.2 for this.

L2 R…

Repeating a line of data

Repeating a line of a data set for each line in another
Suppose you want to access the same information in every line of a data set, and that this information is data-dependent.  For example, suppose you want to add the 25th, 50th, and 75th per…

Hey Look: There’s Log In

We are working on some changes to support.sas.com that I’ll talk about here over the next few weeks. We tried to squeak a few in before leaving Cary for SAS Global Forum. If you didn’t see these on the demo floor, let me point them out now. Thes…