Tag: data mining

Learn about new data mining and machine learning procedures in SAS Viya

by Suzanne Morgen • March 5, 2020 • Comments Off

Have you heard that SAS offers a collection of new, high-performance CAS procedures that are compatible with a multi-threaded approach? The free e-book Exploring SAS® Viya®: Data Mining and Machine Learning is a great resource to learn more about these procedures and the features of SAS® Visual Data Mining and […]

Learn about new data mining and machine learning procedures in SAS Viya was published on SAS Users.

Foresight is 2020! New books to take your skills to the next level

by Suzanne Morgen • January 13, 2020 • Comments Off

Are you ready to get a jump start on the new year? If you’ve been wanting to brush up your SAS skills or learn something new, there’s no time like a new decade to start! SAS Press is releasing several new books in the upcoming months to help you stay […]

Foresight is 2020! New books to take your skills to the next level was published on SAS Users.

Creating a word cloud using Python and SAS® software

by Chevell Parker • October 16, 2019 • Comments Off

Generating a word cloud (also known as a tag cloud) is a good way to mine internet text. Word (or tag) clouds visually represent the occurrence of keywords found in internet data such as Twitter feeds.

Creating a word cloud using Python and SAS® software was published on SAS Users.

SAS introduces the blended classroom

by Chip Wells • March 23, 2018 • Comments Off

We all have different learning styles. Some learn best by seeing and doing; others by listening to lectures in a traditional classroom; still others simply by diving in and asking questions along the way. Traditional face-to-face classroom instruction,…

Flexibility of SAS Enterprise Miner

by Anuja Nagpal • August 24, 2015 • Comments Off

Do you use an array of tools to perform predictive analytics on your data? Is your current tool not flexible enough to accommodate some of your requirements? SAS Enterprise Miner may be your solution. With growing number of data mining applications, having a tool which can do variety of analysis […]

The post Flexibility of SAS Enterprise Miner appeared first on The SAS Training Post.

sklearn DecisionTree plot example needs pydotplus

by Liang Xie • April 26, 2015 • Comments Off

In Python, sklearn (scikit-learn)’s DecisionTree example uses pydot for plotting the generated tree: @here.But for Python 3, pydot has some issues with the string from dot_data.getvalue(), for example it will report “TypeError: startswith first arg mus…

Use recursion and gradient ascent to solve logistic regression in Python

by Charlie H • May 21, 2014 • Comments Off

This post was kindly contributed by DATA ANALYSIS – go there to comment and to read the full post. In his book Machine Learning in Action, Peter Harrington provides a solution for parameter estimation of logistic regression . I use pandas and ggplot to realize a recursive…

PROC PLS and multicollinearity

by Charlie H • December 10, 2013 • Comments Off

Multicollinearity and its consequences

Multicollinearity usually brings significant challenges to a regression model by using either normal equation or gradient descent.

1. Invertible SSCP for normal equation

According to normal equation, the coefficients could be obtained by $\hat{\beta}=(X'X)^{-1}X'y$ . If the SSCP turns to be singular and non-invertible due to multicollinearity, then the coefficients are theoretically not solvable.

2. Unstable solution for gradient descent

The gradient descent algorithm seeks to use iterative methods to minimize residual sum of squares (RSS). For example, as the plot above shows, if there is strong relationship between two regressors in a regression, many possible combinations of $\beta1$ and $\beta2$ lie along a narrow valley, which all corresponds to the minimal RSS. Thus $\beta1$ can be negative, positive or even zero, which becomes a confounding effect with respect to a stable model.

Partial Least Squares v.s. Principle Components Regression

The most direct way to deal with multicollinearity is to break down the regressors and construct new orthogonal variables. PLS and PCR are both dimension reduction methods that eliminate multicollinearity. The difference is that PLS also implements the response variable to select the new components. PLS is particularly useful in answering questions with multiple response variables. The PLS procedure in SAS is a powerful and flexible tool applying either PLS or PCR. One book, An Introduction to StatisticalLearning, suggests PCR over PLS.

While the supervised dimension reduction of PLS can reduce bias, it also has the potential to increase variance, so that the overall benefit of PLS relative to PCR is a wash.

In the example using the baseball data set below, with 10-fold cross-validation, PLS chooses 9 components, while PCR picks out 5.

filename myfile url 'https://svn.r-project.org/ESS/trunk/fontlock-test/baseball.sas';
%include myfile;
proc contents data=baseball   position;
   ods output position = pos;
run;

proc sql;   
   select variable into: regressors separated by ' '
   from pos
   where num between 5 and 20;
quit;
%put ®ressors;

data baseball_t;
   set baseball;
      logsalary = log10(salary);
run;

proc pls data=baseball_t censcale nfac=10 cv=split(10);
   title 'partial least squares';
   model logsalary=®ressors;
run;

proc pls data=baseball_t censcale method = pcr nfac=10 cv=split(10);
   title 'princinple components regression';
   model logsalary=®ressors;
run;

Page 1 of 8

1 2 3 … 8 »