Posts Tagged ‘ data mining ’

Use recursion and gradient ascent to solve logistic regression in Python

May 21, 2014
By
Use recursion and gradient ascent to solve logistic regression in Python

This post was kindly contributed by DATA ANALYSIS - go there to comment and to read the full post. In his book Machine Learning in Action, Peter Harrington provides a solution for parameter estimation of logistic regression . I use pandas and ggplot to realize a recursive alternative. Comparing with the iterative method, the recursion costs more space but may bring...
Read more »

Tags: ,
Posted in SAS | Comments Off

PROC PLS and multicollinearity

December 10, 2013
By

Multicollinearity and its consequences

Multicollinearity usually brings significant challenges to a regression model by using either normal equation or gradient descent.

1. Invertible SSCP for normal equation

According to normal equation, the coefficients could be obtained by \hat{\beta}=(X'X)^{-1}X'y. If the SSCP turns to be singular and non-invertible due to multicollinearity, then the coefficients are...
Read more »

Tags:
Posted in SAS | Comments Off

Use R in Hadoop by streaming

December 9, 2013
By
Use R in Hadoop by streaming

It seems that the combination of R and Hadoop is a must-have toolkit for people working with both statistics and large data set.

An aggregation example

The Hadoop version used here is Cloudera’s CDH4, and the underlying Linux OS is CentOS 6. The data used is a simulated sales data set form a training...
Read more »

Tags: , ,
Posted in SAS | Comments Off

A cheat sheet for linear regression validation

November 29, 2013
By
A cheat sheet for linear regression validation

The link of the cheat sheet is here.I have benefited a lot from the UCLA SAS tutorial, especially the chapter of regression diagnostics. However, the content on the webpage seems to be outdated. The great thing for PROC REG is that it creates...
Read more »

Tags:
Posted in SAS | Comments Off

Kernel selection in PROC SVM

November 21, 2013
By
Kernel selection in PROC SVM

The support vector machine (SVM) is a flexible classification or regression method by using its many kernels. To apply a SVM, we possibly need to specify a kernel, a regularization parameter c and some kernel parameters like gamma. Besides the selectio...
Read more »

Tags:
Posted in SAS | Comments Off

When ROC fails logistic regression for rare-event data

November 13, 2013
By
When ROC fails logistic regression for rare-event data

ROC or AUC is widely used in logistic regression or other classification methods for model comparison and feature selection, which measures the trade-off between sensitivity and specificity. The paper by Gary King warns the dangers using...
Read more »

Tags:
Posted in SAS | Comments Off

A SAS macro that exports data to MongoDB

August 29, 2013
By

MongoDB is possibly the most popular NoSQL data store. To bypass schema and constraint, I feel quite convenient to implement MongoDB as buffer to accompany current RDBMS .Also it is straightforward to use MongoDB and other tools (MEAN) to build s...
Read more »

Tags:
Posted in SAS | Comments Off

Regularization adjustment for PROC SVM

July 31, 2013
By
Regularization adjustment for PROC SVM

SVM is a popular statistical learning method for either classification or regression. For classification, a linear classifier or a hyperplane, such as with w as weight vector and b as the bias, would label data into various categories. The geome...
Read more »

Tags:
Posted in SAS | Comments Off

Proc-x is looking for sponsors!

Dear readers, proc-x is looking for sponsors who would be willing to support the site in exchange for banner ads in the right sidebar of the site. If you are interested, please e-mail me at: [email protected]

Welcome!

SAS-X.com offers news and tutorials about the various SAS® software packages, contributed by bloggers. You are welcome to subscribe to e-mail updates, or add your SAS-blog to the site.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.