This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
Hopefully if you’re viewing the eclipse today, you’ll be using one of the safe methods that won’t harm your eyes! … But if you’re somewhat of a trickster, here’s an optical illusion you can send your friends after the eclipse, that might make them wonder (just for a second) if […]
The post Can you see these spots (after viewing the eclipse)? appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post Code debugging and program history in SAS Enterprise Guide appeared first on The SAS Dummy.
]]>This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
SAS programmers have high expectations for their coding environment, and why shouldn’t they? Companies have a huge investment in their SAS code base, and it’s important to have tools that help you understand that code and track changes over time. Few things are more satisfying as a SAS program that works as designed and delivers perfect results. (Oh, hyperbole you say? I don’t think so.) But when your program isn’t working the way it should, there are two features that can help you get back on track: a code debugger, and program revision history. Both of these capabilities are built into SAS Enterprise Guide. Program history was added in v7.1, and the debugger was added in v7.13.
I’ve written about the DATA step debugger before — both as a teaching tool and as a productivity tool. In this article, I’m sharing a demo of the debugger’s features, led by SAS developer Joe Flynn. Before joining the SAS Enterprise Guide development team, Joe worked in SAS Technical Support. He’s very familiar with “bugs,” and reported his share of them to SAS R&D. Now — like every programmer — Joe makes the bugs. But of course, he fixes most of them before they ever see the light of day. How does he do that? Debugging.
This video is only about 8 minutes long, but it’s packed with good information. In the debugger demo, you’ll learn how you can use standard debugging methods, such as breakpoints, step over and step through, watch variables, jump to, evaluate expression, and more. There is no better way to understand exactly what is causing your DATA step to misbehave.
In the program history demo (the second part of the video), you’ll learn how team members can collaborate using standard source management tools (such as Git). If you establish a good practice of storing code in a central place with solid source management techniques, SAS Enterprise Guide can help you see who changed what, and when. SAS Enterprise Guide also offers a built-in code version comparison tool, which enhances your ability to find the breaking changes. You can also use the code comparison technique on its own, outside of the program history feature.
Take a few minutes to watch the video, and then try out the features yourself. You don’t need a Git installation to play with program history at the project level, though it helps when you want to extend that feature to support team collaboration.
The post Code debugging and program history in SAS Enterprise Guide appeared first on The SAS Dummy.
This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
Robots – everyone has probably been fascinated by the idea of robots at one time or another. From the early science fiction robots (such as Klaatu’s robot Gort) to the mid-1980s movie robots (like Johnny 5), they have been portrayed in many different ways in fiction. These days, with the […]
The post Mapping out the next robot invasion! appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
This question has been asked repeatedly for decades by anyone facing a new system. That system might be a new product, a new piece of equipment, a new process, or anything really that is new and not well understood. Ultimately, you might need to change this system but first you […]
The post Which of my factors are important? appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
I frequently get asked about my favorite book on a particular topic, how to find free SAS learning materials online, how to get help with SAS issues, etc. This lead me to compile a “Great Resources for SAS Programmers” document which I freely share with my students upon request. So […]
The post Jedi SAS Tips – Favorite Resources for SAS Programmers appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post Use a bar chart to visualize pairwise correlations appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
Visualizing the correlations between variables often provides insight into the relationships between variables. I’ve previously written about how to use a heat map to visualize a correlation matrix in SAS/IML, and Chris Hemedinger showed how to use Base SAS to visualize correlations between variables.
Recently a SAS programmer asked how to construct a bar chart that displays the pairwise correlations between variables. This visualization enables you to quickly identify pairs of variables that have large negative correlations, large positive correlations, and insignificant correlations.
In SAS, PROC CORR can computes the correlations between variables, which are stored in matrix form in the output data set. The following call to PROC CORR analyzes the correlations between all pairs of numeric variables in the
Sashelp.Heart data set, which contains data for 5,209 patients in a medical study of heart disease. Because of missing values, some pairwise correlations use more observations than others.
ods exclude all; proc corr data=sashelp.Heart; /* pairwise correlation */ var _NUMERIC_; ods output PearsonCorr = Corr; /* write correlations, p-values, and sample sizes to data set */ run; ods exclude none; |
The CORR data set contains the correlation matrix, p-values, and samples sizes. The statistics are stored in “wide form,” with few rows and many columns. As I previously discussed, you can use the HEATMAPCONT subroutine in SAS/IML to quickly visualize the correlation matrix:
proc iml; use Corr; read all var "Variable" into ColNames; /* get names of variables */ read all var (ColNames) into mCorr; /* matrix of correlations */ ProbNames = "P"+ColNames; /* variables for p-values are named PX, PY, PZ, etc */ read all var (ProbNames) into mProb; /* matrix of p-values */ close Corr; call HeatmapCont(mCorr) xvalues=ColNames yvalues=ColNames colorramp="ThreeColor" range={-1 1} title="Pairwise Correlation Matrix"; |
The heat map gives an overall impression of the correlations between variables, but it has some shortcomings. First, you can’t determine the magnitudes of the correlations with much precision. Second, it is difficult to compare the relative sizes of correlations. For example, which is stronger: the correlation between systolic and diastolic blood pressure or the correlation between weight and MRW? (MRW is a body-weight index.)
These shortcomings are resolved if you present the pairwise correlations as a bar chart.
To create a bar chart, it is necessary to convert the output into “long form.”
Each row in the new data set will represent a pairwise correlation. To identify the row, you should also create a new variable that identifies the two variables whose correlation is represented.
Because the correlation matrix is symmetric and has 1 on the diagonal, the long-form data set only needs the statistics for the lower-triangular portion of the correlation matrix.
Let’s extract the data in SAS/IML. The following statements construct a new ID variable that identifies each new row and extract the correlations and p-values for the lower-triangular elements. The statistics are written to a SAS data set called CorrPairs.
(In Base SAS, you can transform the lower-triangular statistics by using the DATA step and arrays, similar to the approach in this SAS note; feel free to post your Base SAS code in the comments.)
numCols = ncol(mCorr); /* number of variables */ numPairs = numCols*(numCols-1) / 2; length = 2*nleng(ColNames) + 5; /* max length of new ID variable */ PairNames = j(NumPairs, 1, BlankStr(length)); i = 1; do row= 2 to numCols; /* construct the pairwise names */ do col = 1 to row-1; PairNames[i] = strip(ColNames[col]) + " vs. " + strip(ColNames[row]); i = i + 1; end; end; lowerIdx = loc(row(mCorr) > col(mCorr)); /* indices of lower-triangular elements */ Corr = mCorr[ lowerIdx ]; Prob = mProb[ lowerIdx ]; Significant = choose(Prob > 0.05, "No ", "Yes"); /* use alpha=0.05 signif level */ create CorrPairs var {"PairNames" "Corr" "Prob" "Significant"}; append; close; QUIT; |
You can use the HBAR statement in PROC SGPLOT to construct the bar chart. This bar chart contains 45 rows, so you need to make the graph tall and use a small font to fit all the labels without overlapping. The call to PROC SORT and the DISCRETEORDER=DATA option on the YAXIS statement ensure that the categories are displayed in order of increasing correlation.
proc sort data=CorrPairs; by Corr; run; ods graphics / width=600px height=800px; title "Pairwise Correlations"; proc sgplot data=CorrPairs; hbar PairNames / response=Corr group=Significant; refline 0 / axis=x; yaxis discreteorder=data display=(nolabel) labelattrs=(size=6pt) fitpolicy=none offsetmin=0.012 offsetmax=0.012 /* half of 1/k, where k=number of catgories */ colorbands=even colorbandsattrs=(color=gray transparency=0.9); xaxis grid display=(nolabel); keylegend / position=topright location=inside across=1; run; |
The bar chart (click to enlarge) enables you to see which pairs of variables are highly correlated (positively and negatively) and which have correlations that are not significantly different from 0. You can use additional colors or reference lines if you want to visually emphasize other features, such as the correlations that are larger than 0.25 in absolute value.
The bar chart is not perfect. This example, which analyzes 10 variables, is very tall with 45 rows. Among k variables there are k(k-1)/2 correlations, so the
number of pairwise correlations (rows) increases quadratically with the number of variables.
In practice, this chart would be unreasonably tall when there are 14 or 15 variables (about 100 rows).
Nevertheless, for 10 or fewer variables, a bar chart of the pairwise correlations provides an alternative visualization that has some advantages over a heat map of the correlation matrix. What do you think? Would this graph be useful in your work? Leave a comment.
The post Use a bar chart to visualize pairwise correlations appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
This post was kindly contributed by platformadmin.com - go there to comment and to read the full post. |
As someone who specialises in SAS® metadata security, I spend a lot of time using the Authorization tab in SAS Management Console. I also use Linux a great deal. When I run SAS Management Console on Linux, I’ve noticed that the check box background colours on the Authorization tab don’t render correctly (for me at … Continue reading “Java Look & Feel with SAS Management Console on Linux”
This post was kindly contributed by platformadmin.com - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
If you know me, you probably know that I spend a lot of time on the water. I like speed paddling (dragon boat, outrigger canoe, surfski, and racing SUP), and I also have a big pontoon boat at Jordan Lake where I try to go fishing at least once a […]
The post Tracking floods and droughts in Jordan Lake, NC appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
With North Korea’s growing missile capabilities in the news lately, I thought it would be interesting to create a map showing how far (or close) they are from other parts of the world. I first did a few searches on the Web, to see what maps are already out there. […]
The post How far are you from North Korea? appeared first on SAS Learning Post.
This post was kindly contributed by SAS Learning Post - go there to comment and to read the full post. |
The post What is rank correlation? appeared first on The DO Loop.
]]>This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
When someone refers to the correlation between two variables, they are probably referring to the Pearson correlation, which is the standard statistic that is taught in elementary statistics courses.
Elementary courses do not usually mention that there are other measures of correlation.
Why would anyone want a different estimate of correlation?
Well, the Pearson correlation, which is
also known as the product-moment correlation, uses empirical moments of the data (means and standard deviations) to estimate the linear association between two variables. However, means and standard deviations can be unduly influenced by outliers in the data, so the Pearson correlation is not a robust statistic.
A simple robust alternative to the Pearson correlation is called the Spearman rank correlation, which is defined as the Pearson correlation of the ranks of each variable. (If a variable contains tied values, replace those values by their average rank.)
The Spearman rank correlation is simple to compute and conceptually easy to understand. Some advantages of the rank correlation are
PROC CORR in SAS supports several measures of correlation, including the Pearson and Spearman correlations.
For data without outliers, the two measures are often similar. For example, the following call to PROC CORR computes the Spearman rank correlation between three variables in the Sashelp.Class data set:
/* Compute PEARSON and SPEARMAN rank correlation by using PROC CORR in SAS */ proc corr data=sashelp.class noprob nosimple PEARSON SPEARMAN; var height weight age; run; |
According to both statistics, these variables are very positively correlated, with correlations in the range [0.7, 0.88]. Notice that the rank correlations (the lower table) are similar to the Pearson correlations for these data. However, if the data contain outliers, the rank correlation estimate is less influenced by the magnitude of the outliers.
As mentioned earlier, the Spearman rank correlation is conceptually easy to understand. It consists of two steps: compute the ranks of each variable and compute the Pearson correlation between the ranks.
It is instructive to reproduce each step in the Spearman computation.
You can use PROC RANK in SAS to compute the ranks of the variables, then use PROC CORR with the PEARSON option to compute the Pearson correlation of the ranks. If the data do not contain any missing values, then
the following statements implement to two steps that compute the Spearman rank correlation:
/* Compute the Spearman rank correlation "manually" by explicitly computing ranks */ /* First compute ranks; use average rank for ties */ proc rank data=sashelp.class out=classRank ties=mean; var height weight age; ranks RankHeight RankWeight RankAge; run; /* Then compute Pearson correlation on the ranks */ proc corr data=classRank noprob nosimple PEARSON; var RankHeight RankWeight RankAge; run; |
The resulting table of correlations is the same as in the previous section and is not shown. Although PROC CORR can compute the rank correlation directly, it is comforting that these two steps produce the same answer. Furthermore, this two-step method can be useful if you decide to implement a rank-based statistic that is not produced by any SAS procedure. This two-step method is also the way to compute the Spearman correlation of character ordinal variables because PROC CORR does not analyze character variables. However, PROC RANK supports both character and numeric variables.
If you have missing values in your data, then make sure you delete the observations that contain missing values before you call PROC RANK. Equivalently, you can use a WHERE statement to omit the missing values. For example, you could insert the following statement into the PROC RANK statements:
where height^=. & weight^=. & age^=.;
In the SAS/IML language, the CORR function computes the Spearman rank correlation directly, as follows. The results are the same as the results from PROC CORR, and are not shown.
proc iml; use sashelp.class; read all var {height weight age} into X; close; RankCorr = corr(X, "Spearman"); /* compute rank correlation */ |
If you ever need to compute a rank-based statistic manually, you can also use the RANKTIE function to compute the ranks of the elements in a numerical vector, such as
ranktie(X[ ,1], "Mean");
The Spearman rank correlation is a robust measure of the linear association between variables. It is related to the classical Pearson correlation because it is defined as the Pearson correlation between the ranks of the individual variables. It has some very nice properties, including being robust to outliers and being invariant under monotonic increasing transformations of the data. For other measures of correlation that are supported in SAS, see the PROC CORR documentation.
The post What is rank correlation? appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |