This post was kindly contributed by SAS Users - go there to comment and to read the full post. |
Welcome to the continuation of my series Getting Started with Python Integration to SAS Viya. In previous posts, I discussed how to connect to the CAS server, how to execute CAS actions, and how to filter CAS tables. Now it’s time to focus on creating calculated columns on a CAS table.
Load and explore data
First, I’ll load the cars.csv file into memory using my CAS connection object conn, followed by the upload_file method from the SWATpackage. The upload_file method loads a client-side file into memory on the CAS server and returns a reference to the CAS table. I’ll confirm this by using the display function to display the value of the tbl object. Remember, the data in this example is small for training purposes. Processing data in the CAS server’s massively parallel processing environment is typically reserved for larger data.
tbl = conn.upload_file('https://raw.githubusercontent.com/sassoftware/sas-viya-programming/master/data/cars.csv', casout={'name':'cars', 'caslib':'casuser'}) display(type) # and the command results: NOTE: Cloud Analytic Services made the uploaded file available as table CARS in caslib CASUSER(Peter). NOTE: The table CARS has been created in caslib CASUSER(Peter) from binary data uploaded to Cloud Analytic Services. CASTable('CARS', caslib='CASUSER(Peter)') |
The results show the cars.csv file was successfully loaded to the CAS server and tbl is a CASTable object referencing the CAS table CARS.
Next, let’s preview the CARS CAS table using the SWAT package head method on the tbl object.
tbl.head() |
The CAS server returns 5 rows from the CAS table to the Python client as a DataFrame, as expected.
Create Calculated Columns
Now that we have created the table in the caslib and are familiar with the data, let’s learn how to create calculated columns in CAS tables!
Python Technique
First, I’ll begin with the Pandas bracket notation to create new columns. Since the SWAT package tries to blend the world of Pandas and SAS, this is a good starting point. Here, I’ll create two columns. The first statement creates a column with upper case values from the Origin column, the second calculated column will find the percent increase of the Invoice price to the MSRP price.
tbl['NewCol1'] = tbl.Origin.str.upper() tbl['NewCol2'] = ((tbl.MSRP - tbl.Invoice) / tbl.Invoice).mul(100).round() tbl.head() |
The results show the creation of two new columns, NewCol1 and NewCol2. Be aware, behind the scenes the SWAT package adds a variety of parameters to the CASTable object to create the columns. You can view the parameters by displaying the tbl object. These parameters are stored on the CASTable object and are applied to the table when an action or method is executed on the object.
display(tbl) # and the command results: {'name': 'CARS', 'caslib': 'CASUSER(Peter)', 'computedvars': ['NewCol1', 'NewCol2'], 'computedvarsprogram': 'length _upper_0_ varchar(*); _upper_0_ = upcase(Origin); NewCol1 = .... |
eval Method
You can also use the eval method from the SWAT package to create calculated columns. This works similarly to the Pandas eval method.
Below I’ll create the same two columns, but before I do, I’ll reset the CASTable object to reference the CARS CAS table, removing the parameters that were stored on the object from above. Then I’ll use the eval method to create the columns. The eval method requires SAS assignment statements and functions.
tbl = conn.CASTable('cars', caslib='casuser') tbl.eval('NewCol1 = upcase(Origin)') tbl.eval('NewCol2 = round(((MSRP - Invoice) / Invoice) * 100)') tbl.head() |
The results show the two new columns NewCol1 and NewCol2 were created successfully.
CAS Table COMPUTEDVARSPROGRAM Parameter
Lastly, I’ll use the computedVarsProgram parameter to create calculated columns. This method is specific to SAS Viya.
I’ll begin by storing the SAS expressions as string variables. Notice the required semicolon at the end of each expression. Then, I’ll add the computedVarsProgram parameter to the CASTable object tbl, and set the value equal to the two string expressions.
tbl = conn.CASTable('cars', caslib='casuser') createNewCol1 = 'NewCol1 = upcase(Origin);' createNewCol2 = 'NewCol2 = round(((MSRP - Invoice) / Invoice) * 100);' tbl.computedVarsProgram = createNewCol1 + createNewCol2 tbl.head() |
The results show the two new columns NewCol1 and NewCol2 were created successfully.
Summary
SAS Viya offers various ways to create new columns in CAS tables. The SWAT package blends the world of Pandas and SAS, making available many familiar Pandas API techniques. You also have specific CAS techniques handy when working with the CASTable object directly. Choosing the correct method may depend on your greater code structure, but you now have the right tools to make that decision.
Additional and related resources
- Getting Started with Python Integration to SAS® Viya® – Index
- SWAT API Reference
- SAS® Cloud Analytic Services: Fundamentals
- SAS Scripting Wrapper for Analytics Transfer (SWAT)
- CAS Action Documentation
- CAS Action! – a series on fundamentals
Getting Started with Python Integration to SAS® Viya® – Part 8 – Creating Calculated Columns was published on SAS Users.
This post was kindly contributed by SAS Users - go there to comment and to read the full post. |