Last week my manager asked me to randomly pick 10%observations from a large data set and then create a listing so that the Data management programmers can QC the data. I want to share some thoughts here … how easy and simple to do random sampling. …
It’s the added value that counts
Welcome to SAS Training Post, the official blog of SAS Training & Certification! My name is Michele Reister and I am the social media manager for SAS Education. This blog will be a channel to provide you with value-add educational content t…
K-Nearest Neighbor in SAS
K-Nearest-Neighbor, aka KNN, is a widely used data mining tool and is often called memory-based/case-based/instance-based method as no model is fit. A good introduction to KNN can be find at [1], or @ Wiki.
Typically, KNN algorithm relies on a soph…
Next Project: Regularized Logistic Regression
L1 Regularized Logistic Regression effectively handles large number of predictors and serves variable selection simultaneously. [1] indicates that L1 RLR can be implemented via IRLS-LARS algorithm. You can tweak PROC GLMSELECT in v9.2 for this.
L2 R…
Repeating a line of data
Repeating a line of a data set for each line in another
Suppose you want to access the same information in every line of a data set, and that this information is data-dependent. For example, suppose you want to add the 25th, 50th, and 75th per…
Hey Look: There’s Log In
We are working on some changes to support.sas.com that I’ll talk about here over the next few weeks. We tried to squeak a few in before leaving Cary for SAS Global Forum. If you didn’t see these on the demo floor, let me point them out now. Thes…