Extending Excel with Python and SAS Viya

This post was kindly contributed by SAS Users - go there to comment and to read the full post.

Whether you like it or not, Microsoft Excel is still a big hit in the data analysis world. From small to big customers, we still see fit for daily routines such as filtering, generating plots, calculating items on ad-hoc analysis or even running statistical models. Whenever I talk to customers, there is always someone who will either ask: Can this be exported to excel or can we import data from excel?. Recently, other questions started to come up more often: Can we run Python within SAS? How do I allow my team to choose their language of preference? How do I provide an interface that looks like Microsoft Excel, but has SAS functionalities?.

Well… good news is: we can answer YES to all of these questions. With the increase in number of users performing analytics and the number of analytical tools available, for me it was clear that we would end up having lots of disparate processes. For a while this was a problem, but naturally, companies started developing ways to integrate these siloed teams.

In the beginning of last decade, SAS developed SAS Add-in for Microsoft Office. The tool allows customers to run/embed SAS analytic capabilities inside Microsoft Office applications. More recently, SAS released a new version of PROC FCMP allowing users to write Python code and call, if as a function, inside SAS programs.

These advancements provide users the ability to run Python inside Excel. When I say inside, I really mean from within Excel’s interface.

Before we jump to how we can do it, you may ask yourself: Why is this relevant to me? If I know SAS, I import the dataset and work with the data in SAS; If I know Python, I open a Jupyter notebook, import the data set and do my thing. Well… you are kind of right, but let me tell you a story.

The use case

Recently I worked with a customer and his business process was like this: I have a team of data scientists that is highly technical and knowledgeable in Python and SAS. Additionally, I have a team of analysts with little Python knowledge, but are always working with Excel to summarize data, create filters, graphs, etc. My teams need to communicate and collaborate. The normal chain of events follows:

  1. the Python team works on the data, and exports the results to Excel
  2. the analytics team picks up the data set, and runs SAS scripts and excel formulas

This is a problem of inefficiency for the customer. Why can’t the data scientist pass his or her code to the analyst to execute it on the same project without having to wait on the Python specialist to run the code?

I know this sounds overly complicated, but as my SAS colleague Mike Zizzi concludes in his post SAS or Python? Why not use both? Using Python functions inside SAS programs, at the end of the day what matters is that you get your work done. No matter which language, software or IDE you are using. I highly recommend Mike’s article if you want a deep dive on what PROC FCMP has to offer.

The process

Let’s walk through a data scoring scenario similar to my customer’s story. Imagine I am a SAS programmer using Excel to explore data. I am also part of a team that uses Python, creating scoring data code using analytical models developed in Python. My job is to score and analyze the data on Excel and pass the results to the service representative, so they can forward the response to the customer.

Importing data

The data set we’ll work with in this example will help me analyze which customers are more likely to default on a loan. The data and all code used in this article are in the associated GitHub repository. The data dictionary for the data set is located here. First, I open the data set as seen on Sheet1 below in Excel.

Upload data to SAS

Before we jump to the coding part with SAS and Python, I need to send the data to SAS. We’ll use the SAS add-in, in Excel to send data to the local server. I cover the steps in detail below.

I start by selecting the cells I want to upload to the library.

Next, I move to the SAS tab and select the Copy to SAS Server task.

A popup shows up where I confirm the selected cells.

After I click OK, I configure column, table, naming and location options.

SAS uploads the table to the requested library. Additionally, a new worksheet with the library.table name displays the results. As you can see on the image below, the sheet created follows the name WORK.IMPORTED_DATA we setup on the previous step. This represents the table in the SAS library memory. Notice, however, we are still working in Excel.

The next step is to incorporate the code sent from my teammate.

The Python code

The code our colleague sent is pure Python. I don’t necessarily have to understand the code details, just what it does. The Python code below imports and scores a model and returns a score. Note: if you’re attempting this in your own environment, make sure to update the hmeq_model.sav file location in the # Import model pickle file section.

def score_predictions(CLAGE, CLNO, DEBTINC,DELINQ, DEROG, LOAN, MORTDUE, NINQ,VALUE, YOJ):
	"Output: scored"
	# Imporing libraries
	import pandas as pd
	from sklearn.preprocessing import OneHotEncoder
	from sklearn.compose import ColumnTransformer
	from sklearn.externals import joblib
 
	# Create pandas dataframe with input vars
	dataset = pd.DataFrame({'CLAGE':CLAGE, 'CLNO':CLNO, 'DEBTINC':DEBTINC, 'DELINQ':DELINQ, 'DEROG':DEROG, 'LOAN':LOAN, 'MORTDUE':MORTDUE, 'NINQ':NINQ, 'VALUE':VALUE, 'YOJ':YOJ}, index=[0])
 
	X = dataset.values
 
	# Import model pickle file
	loaded_model = joblib.load("C://assets/hmeq_model.sav")
 
	# Score the input dataframe and get 0 or 1 
	scored = int(loaded_model.predict_proba(X)[0,1])
 
	# Return scored dataframe
	return scored

My SAS code calls this Python code from a SAS function defined in the next section.

The SAS code

Turning back to Excel, in the SAS Add-in side of the screen, I click on Programs. This displays a code editor, and as explained on this video, is like any other SAS code editor.

We will use this code editor to write, run and view results from our code.

The code below defines a FCMP function called Score_Python, that imports the Python script from my colleague and calls it from a SAS datastep. The output table, HMEQ_SCORED, is saved on the WORK library in SAS. Note: if you’re attempting this in your own environment, make sure to update the script.py file location in the /* Getting Python file */ section.

proc fcmp outlib=work.fcmp.pyfuncs;
 
/* Defining FCMP function */
proc fcmp outlib=work.fcmp.pyfuncs;
	/* Defining name and arguments of the Python function to be called */
 
	function Score_Python(CLAGE, CLNO, DEBTINC, DELINQ, DEROG, LOAN, MORTDUE, NINQ, VALUE, YOJ);
		/* Python object */
		declare object py(python);
 
		/* Getting Python file  */
		rc = py.infile("C:\assets\script.py");
 
		/* Send code to Python interpreter */
		rc = py.publish();
 
		/* Call python function with arguments */
		rc = py.call("score_predictions",CLAGE, CLNO, DEBTINC, DELINQ, DEROG, LOAN, MORTDUE, NINQ, VALUE, YOJ);
 
		/* Pass Python results to SAS variable */
		MyFCMPResult = py.results["scored"];
 
		return(MyFCMPResult);
	endsub;
run;
 
options cmplib=work.fcmp;
 
/* Calling FCMP function from data step */
data work.hmeq_scored;
	set work._excelexport;
	scored_bad = Score_Python(CLAGE, CLNO, DEBTINC, DELINQ, DEROG, LOAN, MORTDUE, NINQ, VALUE, YOJ);
	put scored_bad=;
run;

 

I place my code in the editor.

We’re now ready to run the code. Select the ‘Running man’ icon in the editor.

The output

The result represents the outcome of the Python model scoring code. Once the code completes the WORK.HMEQ_SCORED worksheet updates with a new column, scored_bad.

The binary value represents if the customer is likely (a ‘1’) or unlikely (a ‘0’) to default on his or her loan. I could now use any built in Excel features to filter or further analyze the data. For instance, I could filter all the customers likely to default on their loans and pass a report on to the customer management team.

Final thoughts

In this article we’ve explored how collaboration between teams with different skills can streamline processes for efficient data analysis. Each team focuses on what they’re good at and all of the work is organized and completed in one place. It is a win-win for everyone.

Related resources

Extending Excel with Python and SAS Viya was published on SAS Users.

This post was kindly contributed by SAS Users - go there to comment and to read the full post.