CAS-Action! Filtering Rows in CAS Tables

This post was kindly contributed by SAS Users - go there to comment and to read the full post.

Welcome back to my SAS Users blog series CAS Action! – a series on fundamentals. I’ve broken the series into logical, consumable parts. If you’d like to start by learning a little more about what CAS Actions are, please see CAS Actions and Action Sets – a brief intro. Or if you’d like to see other topics in the series, see the overview page. Otherwise, let’s learn how to filter rows in CAS tables.

In this example, I will use the CAS procedure to filter rows in CAS tables for use in CAS actions. I could execute similar code in Python, R and more with some slight changes to the syntax for the specific language. The other languages also include specific methods to accomplish the same objective.

I’ll start by executing the following code to create a CAS table named PRODUCTS in the Casuser caslib with 80 million rows. Then, I’ll create a variable named productsTbl to reference the PRODUCTS CAS table, and use that variable as the value to the table parameter in the numRows and fetch CAS actions.

proc cas;
    productsTbl = {name = 'products', caslib = 'casuser'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:

The numRows action shows the PRODUCTS table has 80 million rows, and the fetch action shows the first 20 rows of the table.

Create a Simple Filter

To filter a CAS table you add the where sub parameter to the castable reference with the where expression as a string. The string is similar to using the WHERE statement in SAS.

Here, I’ll add the where key to the productsTbl dictionary that references the CAS table, and filter for all rows where the Product column equals Hats. The where expression is only applied when an action is executed on the productsTbl variable (the CAS table reference). The underlying CAS table is not modified.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'Product = "Hats"'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:
 

After executing the numRows and fetch actions, the results show that there are 8,001,336 rows that contain the product Hats.

Use a SAS Function to Filter a CAS Table

Within the where expression you can also use most SAS Functions. Here, I’ll use the UPCASE function to upper case all values in the Product column to standardize the case, then filter for rows with the value HATS. The results will be the same as the previous section since there are no underlying case issues in the CAS table. Where you would see a difference would be if the table contained mixed case versions of Hats, such as hats, HAts, hATs, etc. Without the function, the query is case sensitive and would only pick up Hats. All iterations of Hats would be picked up with using the UPCASE function.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'upcase(Product) = "HATS"'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:
 

Multiple WHERE Expressions

Just like the WHERE statement in SAS, you can specify two or more where expressions with a logical operator. Here I’ll filter the CAS table for all rows where the Product column contains the value HATS, and a StoreID less than 15. The entire where expression is stored as a string.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'upcase(Product) = "HATS" and StoreID < 15'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:

Notice that the results show that 1,789,106 rows have the product hats with a store ID less than 15.

Create a Calculated Column as a Filter

You can create a calculated expression to use as a filter. The calculated expression is not added as a column to the table output, instead it is used to filter the CAS table during the execution of the actions. Here I’ll multiple the Price by Quantity, and filter for all rows greater than 40.

proc cas;
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = 'upcase(Product) = "HATS" and 
                                    StoreID < 15 and 
                            Price * Quantity > 40'};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:

Notice in the results the calculated column is not output, it was only used to filter the rows. With all three where expressions, we now have a total of 436,728 rows.

Storing the Where Expressions in a Variable

Lastly, I want to clean up my code. Instead of adding the where expressions as a single string, I can use a SOURCE block in the CAS language. The SOURCE block enables embedding text in the program and assigning it to a given variable. It avoids any possible quoting issues and helps compartmentalize your code. Here, I’ll name my SOURCE block filter and add the where expressions. Then, I’ll use the filter variable as the value to the where key. The results are the same as the previous code.

proc cas;
    source filter;
        upcase(Product) = "HATS" and 
        StoreID < 15 and 
        Price * Quantity > 40;
    endsource;
 
    productsTbl = {name = 'products', 
                   caslib = 'casuser',
                   where = filter};
 
    simple.numRows / table = productsTbl;
    table.fetch / table = productsTbl;
quit;

And the log and results:
 

Summary

In summary, using the where sub-parameter allows you to easily filter your tables for use in CAS actions. A few key points:

  • The where sub parameter takes a string as its value and is similar to using the WHERE expression in SAS.
  • Using the where sub-parameter does not modify the original CAS table. It is simply applied when executing an action on the CAS table reference.
  • Within the where expression you can use most SAS functions.
  • If you create a dictionary variable with the CAS table reference, you can then use that variable in a variety of actions within the table parameter.
  • Using a SOURCE block enables you to embed your where expressions in a variable without using a string.

Additional resources

fetch Action
SAS® Cloud Analytic Services: Fundamentals
Code

CAS-Action! Filtering Rows in CAS Tables was published on SAS Users.

This post was kindly contributed by SAS Users - go there to comment and to read the full post.