This post was kindly contributed by SAS Users - go there to comment and to read the full post. |
Welcome back to my SAS Users blog series CAS Action! – a series on fundamentals. In my previous post CAS-Action! Simple Frequency Tables – Part 1, I reviewed how to use the simple.freq CAS action to generate frequency distributions for one or more columns using the distributed CAS server. In this post I will show you how to save the results of the freq action as a SAS data set or a distributed CAS table.
In this example, I will use the CAS language (CASL) to execute the freq CAS action. Be aware, instead of using CASL, I could execute the same action with Python, R and more with some slight changes to the syntax for the specific language. Refer to the documentation for syntax in other languages.
Load the demonstration data into memory
I’ll start by executing the loadTable action to load the WARRANTY_CLAIMS_0117.sashdat file from the Samples caslib into memory. By default the Samples caslib should be available in your SAS Viya environment. I’ll load the table to the Casuser caslib and then I’ll clean up the CAS table by renaming and dropping columns to make the table easier to use. For more information how to rename columns check out my previous post. Lastly I’ll execute the fetch action to preview 5 rows.
proc cas; * Specify the input/output CAS table *; casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"}; * Load the CAS table into memory *; table.loadtable / path = "WARRANTY_CLAIMS_0117.sashdat", caslib = "samples", casOut = casTbl + {replace=TRUE}; * Rename columns with the labels. Spaces replaced with underscores *; *Store the results of the columnInfo action in a dictionary *; table.columnInfo result=cr / table = casTbl; * Loop over the columnInfo result table and create a list of dictionaries *; listElementCounter = 0; do columnMetadata over cr.ColumnInfo; listElementCounter = listElementCounter + 1; convertColLabel = tranwrd(columnMetadata['Label'],' ','_'); renameColumns[listElementCounter] = {name = columnMetadata['Column'], rename = convertColLabel, label=""}; end; * Rename columns *; keepColumns = {'Campaign_Type', 'Platform','Trim_Level','Make','Model_Year','Engine_Model', 'Vehicle_Assembly_Plant','Claim_Repair_Start_Date', 'Claim_Repair_End_Date'}; table.alterTable / name = casTbl['Name'], caslib = casTbl['caslib'], columns=renameColumns, keep = keepColumns; * Preview CAS table *; table.fetch / table = casTbl, to = 5; quit; |
The results above show a preview of the warranty_claims CAS table.
One Way Frequency for Multiple Columns
Next, I’ll execute the freq action to generate a frequency distribution for multiple columns.
proc cas; casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"}; colNames = {'Model_Year', 'Vehicle_Assembly_Plant', {name = 'Claim_Repair_Start_Date', format = 'yyq.'} }; simple.freq / table= casTbl, inputs = colNames; quit; |
The freq CAS action returns the frequency distribution of each column in a single result. While this is great, what if you want to create a visualization with the data? Or continue processing the summarized data? How do you save this as a table? Well, you have a few options.
Save the results as a SAS data set
First, you can save the results of a CAS action as a SAS data set. The idea here is the CAS action will process the data in the distributed CAS server, and then the CAS server returns smaller, summarized results to the client (SAS Studio). The summarized results can then be saved as a SAS data set.
To save the results of a CAS action simply add the result option after the action with a variable name. The results of an action return a dictionary to the client and store it in the specified variable. For example, to save the results of the freq action as a SAS data set complete the following steps:
- Execute the same CASL code from above, but this time specify the result option with a variable name to store the results of the freq action. Here i’ll save the results in the variable freq_cr.
- Use the DESCRIBE statement to view the structure and data type of the CASL variable freq_cr in the log (not required).
- Use the SAVERESULT statement to save the CAS action result table from the dictionary freq_cr as a SAS data set named warranty_freq. To do this specify the key Frequency that is stored in the dictionary freq_cr to obtain the result table.
proc cas; * Reference the CAS table *; casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"}; * Specify the columns to analyze *; colNames = {'Model_Year', 'Vehicle_Assembly_Plant', {name = 'Claim_Repair_Start_Date', format = 'yyq.'} }; * 1. Analyze the CAS table and store the results *; simple.freq result = freq_cr / table= casTbl, inputs = colNames; * 2. View the dictionary in the log *; describe freq_cr; * 3. Save the result table as a SAS data set *; saveresult freq_cr['Frequency'] dataout=work.warranty_freq; quit; |
In the log, the results of the DESCRIBE statement shows the variable freq_cr is a dictionary with one entry. It contains the key Frequency and the value is a result table. The table contains 22 rows and 6 columns. The NOTE in the log shows the SAVERESULT statement saved the result table from the dictionary as a SAS data set named warranty_freq in the work library.
Once the summarized results are stored in a SAS library, use your traditional SAS programming knowledge to process the SAS table. For example, now I can visualize the summarized data using the SGPLOT procedure.
* Plot the SAS data set *; title justify=left height=16pt "Total Warranty Claims by Year"; proc sgplot data=work.warranty_freq noborder; where Column = 'Model_Year'; vbar Charvar / response = Frequency nooutline; xaxis display=(nolabel); label Frequency = 'Total Claims'; format Frequency comma16.; quit; |
Save the Results as a CAS Table
Instead of saving the summarized results as a SAS data set, you can create a new CAS table on the CAS server. To do that all you need is to add the casOut parameter in the action. Here I’ll save the results of the freq CAS action to a CAS table named warranty_freq in the Casuser caslib, and I will give the table a descriptive label.
proc cas; * Reference the CAS table *; casTbl = {name = "WARRANTY_CLAIMS", caslib = "casuser"}; * Specify the columns to analyze *; colNames = {'Model_Year', 'Vehicle_Assembly_Plant', {name = 'Claim_Repair_Start_Date', format = 'yyq.'} }; * Analyze the CAS table and create a new CAS table *; simple.freq / table= casTbl, inputs = colNames, casOut = { name = 'warranty_freq', caslib = 'casuser', label = 'Frequency analysis by year, assembly plant and repair date by quarter' }; quit; |
The results above show the freq action returned information about the newly created CAS table. Once you have a CAS table in the distributed CAS server you can continue working with it using CAS, or you can visualize the data like we did before using SGPLOT. The key concept here is the SGPLOT procedure does not visualize data on the CAS server. The SGPLOT procedure returns the entire CAS table back to SAS (compute server) as a SAS data set, then the visualization occurs on the client. This means if the CAS table is large, an error or slow processing might occur. However, in our scenario we created a smaller summarized CAS table, so sending 22 rows back to the client (compute server) isn’t going to be an issue.
* Make a library reference to a Caslib *; libname casuser cas caslib='casuser'; * Plot the SAS data set *; title justify=left height=16pt "Total Warranty Claims by Year"; proc sgplot data=casuser.warranty_freq noborder; where _Column_ = 'Model_Year'; vbar _Charvar_ / response = _Frequency_ nooutline; xaxis display=(nolabel); label _Frequency_ = 'Total Claims'; format _Frequency_ comma16.; quit; |
Summary
Using the freq CAS action enables you to generate a frequency distribution for one or more columns and enables you to save the results as a SAS data set or a CAS table. They keys to this process are:
- CAS actions execute on the distributed CAS server and return summarized results back to the client as a dictionary. You can store the dictionary using the result option.
- Using dictionary manipulation techniques and the SAVERESULT statement you can save the summarized result table from the dictionary as a SAS data set. Once you have the SAS data set you can use all of your familiar SAS programming knowledge on the traditional compute server.
- Using the casOut parameter in a CAS action enables you to save the summarized results in the distributed CAS server.
- The SGPLOT procedure does not execute in CAS. If you specify a CAS table in the SGPLOT procedure, the entire CAS table will be sent back to SAS compute server for processing. This can cause an error or slow processing on large tables.
- Best practice is to summarize large data in the CAS server, and then work with the summarized results on the compute server.
Additional resources
freq action
DESCRIBE statement
SAVERESULT statement
Plotting a Cloud Analytic Services (CAS) In-Memory Table
SAS® Cloud Analytic Services: CASL Programmer’s Guide
SAS® Cloud Analytic Services: Fundamentals
CAS Action! – a series on fundamentals
Getting Started with Python Integration to SAS® Viya® – Index
CAS-Action! Saving Frequency Tables – Part 2 was published on SAS Users.
This post was kindly contributed by SAS Users - go there to comment and to read the full post. |