This post was kindly contributed by SAS Users - go there to comment and to read the full post. |
DATA Step BY Statements
DATA Step is a very powerful language that SAS and Open Source programmers leverage to stage data for the analytical life cycle. A popular technique is to use the DESCENDING option on BY variables to identify the largest value. Let’s review the example in Figure 1:
- On line 74 we are using the descending option on the BY statement for the numeric variable MSRP. The reason we are doing this is so we can identify the most expensive car for each make of car in our data set.
- On line 79 we group our data by MAKE of car.
- On line 80 we leverage the FIRST. statement on the subsetting IF statement to output the first record for each MAKE. In Figure 2 we can review the results.
Figure 1. Descending BY Statement
Figure 2. Listing of Most Expensive Cars by MAKE
What is CAS?
CAS is SAS Viya’s in-memory engine that processes data and logic in a distributed computing paradigm. When working with CAS tables we can simulate the DESCENDING BY statement by creating a CAS View which will then become the source table to our DATA Step. Let’s review Figure 3:
- On line 79 we will leverage the CASL (SAS® Cloud Analytic Services Language) action set TABLE with the action VIEW to create the CAS View that will be used as our source table in the DATA Step.
- On lines 80 and 81 we will store our CAS View in the CASUSER CASLIB with the name of DESCENDING.
- On line 82 and 83 we use the TABLES statement to specify the input CAS table to our CAS View.
- On line 84 we use the VARLIST statement to identify the columns from the input table we want in our CAS View.
- On lines 85 we create a new variable for our CAS View using the computedVars statement,
- On line 86 we provide the math for our new variable N_MSRP. N_MSRP is the negated value of the input CAS table variable MSRP. Note: This simulation only works for numeric variables. For character data I suggest using LAST. processing which you can review in this blog post.
Figure 3. Simulating DESCENDING BY Statement for Numeric Variables
Now that we have our CAS View with its new variable N_MSRP, we can move on to the DATA Step code in Figure 3.
- On line 92 the SET statement specifies the source to our DATA Step; CAS View CASUSER.DESCENDING
- On line 83 we leverage the BY Statement to group our data in ascending order for the CAS View variables MAKE and N_MSRP. Because N_MSRP is in ascending order our original variable MSRP is in DESCENDING order.
- On line 94 we use a subsetting IF statement to output the first occurrence of each MAKE.
Figure 4 is a listing of our new CAS table CASUSER.DESCENDING2 and displays the most expensive car for each make of car.
Figure 4. Listing of Most Expensive Cars by MAKE
Template for Creating a CAS View
/* Create a CAS view */ /* For each DESCENDING numeric create a new variable(s) */ /* The value of the new variable(s) is the negated value */ /* of the original DESCENDING BY numeric variable(s) */ proc cas; table.view / replace = true caslib='casuser' name='descending' tables={{ name='cars' varlist={'msrp' 'make'}, computedVars={{name='n_msrp'}}, computedVarsProgram='n_msrp = -(msrp)' }}; run; quit; data casuser.descending2; set casuser.descending; by make n_msrp ; if first.make ; run; proc print data=casuser.descending2; title "Most Expensive Cars"; run; |
Conclusion
It is a very common coding technique to process data with a DESCENDING BY statement using DATA Step. With Viya 3.5 the DESCENDING BY statements is supported, for numeric and character data in DATA Step, with this caveat: DESCENDING works on all but the first BY variable on the BY statement. For earlier versions of SAS Viya this simulation is the best practices for numeric data that you want in DESCENDING order.
How to Simulate DATA Step DESCENDING BY Statements in SAS® Cloud Analytic Services (CAS) was published on SAS Users.
This post was kindly contributed by SAS Users - go there to comment and to read the full post. |