# Posts Tagged ‘ data mining ’

## SAS introduces the blended classroom

March 23, 2018
We all have different learning styles. Some learn best by seeing and doing; others by listening to lectures in a traditional classroom; still others simply by diving in and asking questions along the way. Traditional face-to-face classroom instruction,...

## Flexibility of SAS Enterprise Miner

August 24, 2015
Do you use an array of tools to perform predictive analytics on your data? Is your current tool not flexible enough to accommodate some of your requirements? SAS Enterprise Miner may be your solution. With growing number of data mining applications, having a tool which can do variety of analysis

## Use recursion and gradient ascent to solve logistic regression in Python

May 21, 2014
This post was kindly contributed by DATA ANALYSIS - go there to comment and to read the full post. In his book Machine Learning in Action, Peter Harrington provides a solution for parameter estimation of logistic regression . I use pandas and ggplot to realize a recursive alternative. Comparing with the iterative method, the recursion costs more space but may bring...

## PROC PLS and multicollinearity

December 10, 2013
### Multicollinearity and its consequences

Multicollinearity usually brings significant challenges to a regression model by using either normal equation or gradient descent.

#### 1. Invertible SSCP for normal equation

According to normal equation, the coefficients could be obtained by . If the SSCP turns to be singular and non-invertible due to multicollinearity, then the coefficients are...

## Use R in Hadoop by streaming

December 9, 2013
It seems that the combination of R and Hadoop is a must-have toolkit for people working with both statistics and large data set.

### An aggregation example

The Hadoop version used here is Cloudera’s CDH4, and the underlying Linux OS is CentOS 6. The data used is a simulated sales data set form a training...

## A cheat sheet for linear regression validation

November 29, 2013
The link of the cheat sheet is here.I have benefited a lot from the UCLA SAS tutorial, especially the chapter of regression diagnostics. However, the content on the webpage seems to be outdated. The great thing for PROC REG is that it creates...

## Kernel selection in PROC SVM

November 21, 2013
The support vector machine (SVM) is a flexible classification or regression method by using its many kernels. To apply a SVM, we possibly need to specify a kernel, a regularization parameter c and some kernel parameters like gamma. Besides the selectio...

## When ROC fails logistic regression for rare-event data

November 13, 2013
ROC or AUC is widely used in logistic regression or other classification methods for model comparison and feature selection, which measures the trade-off between sensitivity and specificity. The paper by Gary King warns the dangers using...

