Top 10 tips and tricks about PROC SQL

This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post.

Interestingly, I just found that the most searched keyword is PROC SQL, through the traffic analysis of my tiny blog. The reason possibly is: nowadays everybody knows SQL, more or less; then someone can do some parts of the SAS job by PROC SQL without using any procedure or DATA step.

As a dialect of SQL, PROC SQL can really play a lot of tricks in SAS, which are mostly carried out by DATA step or PROCs. Here I summarize the 10 interesting ones among them. Let’s start with the free SAS help dataset SASHELP.CLASS. This dataset contains the weight, height, sex and other information of 19 fake teenagers. In this demo, I primarily paly with the WEIGHT variable from it.


data class;
   set sashelp.class;
   obs = _n_;
run;

1. Calculate the median of a variable
This is the task usually done by PROC MEANS. With the aggregating HAVING clause, PROC SQL can fulfill this purpose as well.

proc sql;
   select avg(weight)
   from (select e.weight
   from class e, class d
   group by e.weight
   having sum(case when e.weight = d.weight then 1 else 0 end)
      ge abs(sum(sign(e.weight - d.weight))));
quit;

2. Draw a horizontal bar chart
It is a substitute in case that we want to have a look at the distribution of a variable but don’t know PROC GPLOT or other plotting procedures.

proc sql;
   select age,
   repeat('*',count(*)*4) as hbar
   from class
   group by age
   order by age;
quit;

3. Return the running total for a variable
DATA step and PROC IML both can get this job done, like what I have shown at a previous post. PROC SQL is the 3rd choice, just with the little help of a subquery at the SELECT clause.

proc sql;
   select name, weight, 
      (select sum(a.weight) from class as a where a.obs <= b.obs) as running_total
   from class as b;
quit;

4. Report data with subtotal
The first thing in my mind is the powerful REPORT procedure. However, PROC SQL is an alternative with the set operator.

proc sql;
   select name, weight
   from class
   union all
   select 'Total', sum(weight) 
   from class;
quit;

5. Find the column information from metadata
SAS stores the metadata at its DICTIONARY datasets. PROC SQL can visit the column information easily, without using the CONTENTS procedure.

proc sql;
   select name, type, varnum
   from sashelp.vcolumn
   where libname = 'WORK' and memname = 'CLASS';
quit;

6. Rank a variable
PROC RANK is a really handy tool for this functionality. Besides it, PROC SQL can do some simple ranking as well.

proc sql;
   select name, a.weight, (select count(distinct b.weight)
   from class b
   where b.weight <= a.weight) as rank
   from class a;
quit;

7. Random sampling
PROC SURVEYSELECT involves with too much statistics. If we just need a simple random sampling, PROC SQL is a more popular option. For example, if I want to pick out 8 observations, I can use PROC SQL like:

proc sql outobs = 8;
   select *
   from class
   order by ranuni(1234);
quit;

8. Replicate a data set without data
In PROC SQL, it is a fairly simple one-sentence statement to create an empty data set. Not sure DATA step can do it.

proc sql;
   create table class2 like class;
quit;

9. Transpose data
DATA step ARRAY is a big headache for most people. Suppose that we want to list the names of the teenagers by their genders, PROC SQL has a way to do it, although it may require some complicated queries/subqueries.

proc sql;
   select max(case when sex='F'
      then name else ' ' end) as Female,
      max(case when sex='M'
      then name else ' ' end) as Male
   from (select e.sex,
      e.name,
      (select count(*) from class d
      where e.sex=d.sex and e.obs < d.obs) as level
      from class e) 
   group by level;
quit;

10. Count the missing values
The great thing in PROC SQL is that the NMISS and N functions under it work for both numeric and character variables. We can also write a macro based on them to search the missing values for all variables.

proc sql
   select count(*), nmiss(weight), n(weight)
   from class;
quit;

This post was kindly contributed by SAS ANALYSIS - go there to comment and to read the full post.