Using built-in Git operations in SAS

This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post.

It seems that everyone knows about GitHub — the service that hosts many popular open source code projects. The underpinnings of GitHub are based on Git, which is itself an open-source implementation of a source management system. Git was originally built to help developers collaborate on Linux (yet another famous open source project) — but now we all use it for all types of projects.

There are other free and for-pay services that use Git, like Bitbucket and GitLab. And there are countless products that embed Git for its versioning and collaboration features. In 2014, SAS developers added built-in Git support for SAS Enterprise Guide.

Since then, Git (and GitHub) have grown to play an even larger role in data science operations and DevOps in general. Automation is a key component for production work — including check-in, check-out, commit, and rollback. In response, SAS has added Git integration to more SAS products, including:

  • the Base SAS programming language, via a collection of SAS functions.
  • SAS Data Integration Studio, via a new source control plugin
  • SAS Studio (experimental in v3.8)

You can use this Git integration with any service that supports Git (GitHub, GitLab, etc.), or with your own private Git servers and even just local Git repositories.

SAS functions for Git

Git infrastructure and functions were added to SAS 9.4 Maintenance 6. The new SAS functions all have the helpful prefix of “GITFN_” (signifying “Git fun!”, I assume). Here’s a partial list:

GITFN_CLONE  Clones a Git repository (for example, from GitHub) into a directory on the SAS server.
GITFN_COMMIT  Commits staged files to the local repository
GITFN_DIFF Returns the number of diffs between two commits in the local repository and creates a diff record object for the local repository.
GITFN_PUSH  Pushes the committed files in the local repository to the remote repository.
GITFN_NEW_BRANCH  Creates a Git branch

 

The function names make sense if you’re familiar with Git lingo. If you’re new to Git, you’ll need to learn the terms that go with the commands: clone, repo, commit, stage, blame, and more. This handbook provided by GitHub is friendly and easy to read. (Or you can start with this xkcd comic.)

You can learn about the SAS functions from the SAS documentation — including important details about how to connect SAS to Git.

Here’s an example program that clones (that is, copies into a local space) a repository that contains code samples from my blog:

data _null_;
 version = gitfn_version();
 put version=;             
 
 rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/",
   "c:\Projects\sas-dummy-blog");
 put rc=;
run;

In one line, this function fetches an entire collection of code files from your source control system. Here’s a more concrete example that fetches the code to a work space, then runs a program from that repository. (This is safe for you to try — here’s the code that will be pulled/run. It even works from SAS University Edition.)

options dlcreatedir;
%let repoPath = %sysfunc(getoption(WORK))/sas-dummy-blog;
libname repo "&repoPath.";
libname repo clear;
 
/* Fetch latest code from GitHub */
data _null_;
 rc = gitfn_clone("https://github.com/sascommunities/sas-dummy-blog/",
   "&repoPath.");
 put rc=;
run;
 
/* run the code in this session */
%include "&repoPath./rng_example_thanos.sas";

You could use the other GITFN functions to stage and commit the output from your SAS jobs, including log files, data sets, ODS results — whatever you need to keep and version.

Using Git in SAS Data Integration Studio

SAS Data Integration Studio has supported source control integration for many years, but only for CVS and Subversion (still in wide use, but they aren’t media darlings like GitHub). By popular request, the latest version of SAS Data Integration Studio adds support for a Git plug-in.

Example of Git in SAS DI Studio

See the documentation for details: How to use the Git plug-in for SAS Data Integration Studio

Using Git in SAS Studio

Beginning as an experimental feature in SAS Studio 3.8, you can manage your SAS programs in a Git repository. This integration requires a bit of set up to allow SAS Studio to connect to your repository “as you” using the standard mechanism of SSH public/private keys. Once configured, you can add repositories to your SAS Studio session, fetch the latest versions of files, stage new files, commit files, and see history. You’ll see the Git content set apart with a special icon, indicating that it’s managed in Git.

Read more about setup and use in the SAS Studio documentation

Add SAS Studio custom tasks from Git

Did you know that you can add custom tasks to SAS Studio? And that you can share these tasks in a central location using Git? This feature has been available for several releases. You can configure this in the Task Repositories pane of the Preferences window.

You can try this with a collection of SAS-supplied custom tasks, available here as part of our “Custom Tasks Tuesday” series.

Using Git in SAS Enterprise Guide

This isn’t new, but I’ll include it for completeness. SAS Enterprise Guide supports built-in Git repository support for SAS programs that are stored in your project file. You can use this feature without having to set up any external Git servers or repositories. Also, SAS Enterprise Guide can recognize when you reference programs that are managed in an external Git repository. This integration enables features like program history, compare differences, commit, and more. Read more and see a demo of this in action here.

program history

If you use SAS Enterprise Guide to edit and run SAS programs that are managed in an external Git repository, here’s an important tip. Change your project file properties to “Use paths relative to the project for programs and importable files.” You’ll find this checkbox in File->Project Properties.

With this enabled, you can store the project file (EGP) and any SAS programs together in Git, organized into subfolders if you want. As long as these are cloned into a similar structure on any system you use, the file paths will resolve automatically.

The post Using built-in Git operations in SAS appeared first on The SAS Dummy.

This post was kindly contributed by The SAS Dummy - go there to comment and to read the full post.