How to refactor SAS code to leverage SAS Viya

This post was kindly contributed by SAS Users - go there to comment and to read the full post.

If you’re a SAS programmer who now uses SAS Viya and CAS, it’s worth your time to optimize your existing programs to take advantage of the new environment. This post is a continuation of my SAS Global Forum 2020 paper Best Practices for Converting SAS® Code to Leverage SAS® Cloud Analytic Services and my SGF 2020 Super Demo.

The best approach for refactoring SAS code for SAS Viya has a few steps:

  • First, “lift and shift” your existing code to run successfully in the compute server for SAS Viya.
  • Next, create CASLIB statements to all of your data sources: i.e. sas7bdat, CSV files, parquet files, RDBMs, cloud data sources, etc.
  • Finally, identify the longest running steps so you know where you have the biggest opportunities. For example, look at steps where the “real time” is 30 minutes or longer, as well as steps that are CPU bound. CPU-bound steps are steps where the CPU time is equal to or greater than the real time for that step.

To help us identify those steps we can leverage a new utility to analyze SAS logs and create reports to help us understand the Real Time and CPU Time for each step. Read on to learn more about this final step in the code refactoring process.

Utility: SAS Log Parser

To generate these reports, I created a SAS program that will read all SAS log files in a directory and create one report per SAS log as well as a descending Real Time (Clock Time) and CPU Time reports. Figure 1 is an example of the report that is generated for each SAS log. In this report we see each step’s procedure or DATA Step’s Real Time and CPU Time. It’s derived by picking up on SAS log entries like this:

NOTE: PROCEDURE SGPLOT used (Total process time):
      real time           2.79 seconds
      cpu time            0.08 seconds

NOTE: The SAS System used:
      real time           1:08.86
      cpu time            1:18.18

Note, the Total Time and Total CPU Time are fields that are populated when the SAS log note “NOTE: The SAS System used:” is encountered. SAS programs that are ran in batch or using an RSUBMIT process via MP CONNECT code will generate this note. MP CONNECT or SAS Grid Computing enables you to execute RSUBMIT statements asynchronously and requires SAS/CONNECT.

Figure 1. sampleETS.log report

Descending Real Time Report

Figure 2 contains an example of the descending real time report. In this report we observe in the Step column that the longest running step is a PROC LOGISTIC that takes over 14 hours (Real Time column) from the SAS log called Sample3.log (File Name column). The best way to use this report is to focus on steps that take longer than 30 minutes. In our case we have 9 steps from 3 SAS logs. Now that we know that we can review the details of each step and then benchmark if that step would run faster by leveraging SAS® Cloud Analytic Services (CAS). Note, for CAS to process data, all data must be in CAS tables and the step must be coded using CAS-enabled steps.

Figure 2. Descending Real Time Report

Descending CPU Time Report

Figure 3 contains an example of the descending CPU times report where we observe that the most CPU intensive step takes over 13 hours (CPU Time column) and is from the Samples3.log (File Name column). Note, if you review the Real Time and CPU Time columns you should notice that only observation 11 (Obs column) has a CPU Time that exceeds the Real Time making it CPU bound.  However, we would not focus on this step since the Real Time is less than 30 minutes.

Figure 3. Descending CPU Time Report

Source code for the SAS Log Parser

I’ve bundled my SAS code for these steps in my GitHub repository for SAS Global Forum 2020. You’ll find these programs along with the other code that supports my previous topics of adapting SAS 9 code for SAS Viya.

sasLogParserMacros.sas contains three macros that drive the process. The macro program %LIST_FILES lists all files with a given extension, %CHECK checks for the existence of a file and deletes it if found, and %SASLOG parses a SAS log and provides the values found in the reports. When you save this file ensure you name it sasLogParserMacros.sas and in the same directory that you save sasLogParser.sas.

sasLogParser.sas is the program we submit to produce the reports. This code includes the code sasLogParserMacros.sas and then generate the repots. The only two statements we need to modify are the first two %LET statements in this program. The first %LET statement points to the location of the two SAS programs sasLogParserMacros.sas and sasLogParser.sas. The second %LET statement points to the directory containing all the SAS logs we want to parse. Note, outside of the two %let statements in sasLogParser.sas do not change any other statement in either program.

Conclusion

In order to understand which steps are good candidates for leveraging the in-memory engine CAS, we must first understand the real time and CPU time of each step. Then we can benchmark which engine in SAS Viya is appropriate for that step — the compute server or CAS.  The code that I’ve shared for benchmarking can run within SAS 9 or SAS Viya, on both Windows and Linux.

This post was kindly contributed by SAS Users - go there to comment and to read the full post.