Tag: data mining

Map and Reduce in MapReduce: a SAS Illustration

In last post, I mentioned Hadoop, the open source implementation of Google’s MapReduce for parallelized processing of big data. In this long National Holiday, I read the original Google paper, MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean and Sanjay Ghemawat and got that the terminologies of “map” and “reduce” were basically borrowed […]

An Analytical Valley: Big Data and Data Scientists (and SAS Programmers)

Tom Davenport reported an observation that Silicon Valley is becoming more analytical since companies in the Valley such as Google, Facebook, eBay, LinkedLn all have strong presences in analytics. Besides such predominant companies, I’d also like to add Yahoo to the list although Yahoo is no longer in its peak. Yahoo is the largest sponsor […]

Data mining in the financial services industry

The Analytics Conference is coming up soon (October 24-25, Orlando)!  To refresh your memory, the Analytics Conference Series is a merger of three annual educational conferences: the data mining conference, the business forecasting conference and the …

Regularized Discriminant Analysis

Demo SAS implementation of Regularized (Linear) Discriminate Analysis of J. Friedman (1989)[1]. Simpler introduction can be found at [2]. Regularized QDA follows similarly.

To save coding, I called R within SAS to finish the computation. For details…

Music social network on DNA microarray

The incoming 2011 KDD Cup data mining competition [1] by Yahoo! Lab posts an interesting challenge to predict the users’ ratings for individual songs out of this company’s huge music database. Unlike previous KDD Cups projects filled by tons of varia…

Visualize decision tree by coding Proc Arboretum

Decision tree (tree-based partition or recursive portioning) dominates the top positions of recent data mining competitions. It is easy to realize and explain like logistic regression, but usually brings more powers (AUC). Not like SVM, neural network …

Macro embedded function finds AUC

As a routine practice to reuse codes, SAS programmers tend to contain procedures in a SAS macro and pass arguments to them as macro variables. The result could be anything by data set and SAS procedure: figure, dataset, SAS list, etc. Thus, macro in SA…

SAS vs R in data mining (1): challenges for SAS

The past three years witnessed the rise of R, an open source statistical software. Search R related books in Amazon, and tons of recent titles show up ranging from graphics to scientific computation. Thanks to those graduates sprang out of school that …