Native scheduler, new types of workloads, and more: introducing the new SAS Grid Manager

This post was kindly contributed by SAS Users - go there to comment and to read the full post.

You’ll notice several changes in SAS Grid Manager with the release of SAS 9.4M6.
 
For the first time, you can get a grid solution entirely written by SAS, with no dependence on any external or third-party grid provider.
 
This post gives a brief architectural description of the new SAS grid provider, including all major components and their role. The “traditional” SAS Grid Manager for Platform has seen some architectural changes too; they are detailed at the bottom.

A new kid in town

SAS Grid Manager is a complex offering, composed of different layers of software. The following picture shows a very simple, high-level view. SAS Infrastructure here represents the SAS Platform, for example the SAS Metadata Server, SAS Middle Tier, etc. They service the execution of computing tasks, whether a batch process, a SAS Workspace server, and so on. In a grid environment these computing tasks are distributed on multiple hosts, and orchestrated/managed/coordinated by a layer of software that we can generically call Grid Infrastructure or Grid Middleware. That’s basically a set of lower-level components that sit between computing processes and the operating system.

Since its initial design more than a decade ago, the SAS Grid Manager offering has always been able to leverage different grid infrastructure providers, thanks to an abstraction layer that makes them transparent to end-user client software.
 
Our strategic grid middleware has been, since the beginning, Platform Suite for SAS, provided by Platform Computing (now part of IBM).
 
A few years ago, with the release of SAS 9.4M3, SAS started delivering an additional grid provider, SAS Grid Manager for Hadoop, tailored to grid environments co-located with Hadoop.
 
The latest version, SAS 9.4M6, opens up choices with the introduction of a new, totally SAS-developed grid provider. What’s its name? Well, since it’s SAS’s grid provider, we use the simplest one: SAS Grid Manager. To avoid confusion, what we used to call SAS Grid Manager has been renamed SAS Grid Manager for Platform.

The reasons for a choice

The SAS-developed provider for SAS Grid Manager:
 
• Is streamlined specifically for SAS workloads.
 
• Is easier to install (simply use the SAS Deployment Wizard (SDW) and administer.
 
• Extends workload management and scheduling capabilities into other technologies, such as
 
    o Third-party compute workloads like open source.
 
    o SAS Viya (in a future release).
 
• Reduces dependence of SAS Grid Manager on third party technologies.

So what are the new components?

The SAS-developed provider for SAS Grid Manager includes:
 
• SAS Workload Orchestrator
 
• SAS Job Flow Scheduler
 
• SAS Workload Orchestrator Web Interface
 
• SAS Workload Orchestrator Administration Utility

 
These are the new components, delivered together with others also available in previous releases and with other providers, such as the Grid Manager Thin Client Utility (a.k.a. SASGSUB), the SAS Grid Manager Agent Plug-in, etc. Let’s see these new components in more detail.

SAS Workload Orchestrator

The SAS Workload Orchestrator is your grid controller – just like Platform’s LSF is with SAS Grid Manager, it:
 
• Dispatches jobs.
 
• Monitors hosts and spreads the load.
 
• Is installed and runs on all machines in the cluster (but is not required on dedicated Metadata Server or Middle-tier hosts).
 
A notable difference, when compared to LSF, is that the SAS Workload Orchestrator is a single daemon, with its configuration stored in a single text file in json format.
 
Redeveloped for modern workloads, the new grid provider can schedule more types of jobs, beyond just SAS jobs. In fact, you can use it to schedule ANY job, including open source code running in Python, R or any other language.

SAS Job Flow Scheduler

SAS Job Flow Scheduler is the flow scheduler for the grid (just as Platform Process Manager is with SAS Grid Manager for Platform):
 
• It passes commands to the SAS Workload Orchestrator at certain times or events.
 
• Flows can be used to run many tasks in parallel on the grid.
 
• Flows can also be used to determine the sequence of events for multiple related jobs.
 
• It only determines when jobs are submitted to the grid, but they may not run immediately if the right conditions are not met (hosts too busy, closed, etc.)
 
The SAS Job Flow Scheduler provides flow orchestration of batch jobs. It uses operating system services to trigger the flow to handle impersonation of the user when it is time for the flow to start execution.
 
A flow can be built using the SAS Management Console or other SAS products such as SAS Data Integration Studio.
 
SAS Job Flow Scheduler includes the ability to run a flow immediately (a.k.a. “Run Now”), or to schedule the flow for some future time/recurrence.
 
SAS Job Flow Scheduler consists of different components that cooperate to execute flows:
 
SASJFS service is the main running service that handles the requests to schedule a flow. It runs on the middle tier as a dedicated thread in the Web Infrastructure Platform, deployed inside sasserver1. It uses services provided by the data store (SAS Content Server) and Metadata Server to read/write the configuration options of the scheduler, the content of the scheduled flows and the history records of executed flows.
 
Launcher acts as a gateway between SASJFS and OS Trigger. It is a daemon that accepts HTTP connections using basic authentication (username/password) to start the OS Trigger program as the scheduled user. This avoids the requirement to save end-users’ passwords in the grid provider, for both Windows and Unix.
 
OS Trigger is a stand-alone Java program that uses the services of the operating system to handle the triggering of the scheduled flow by providing a call to the Job Flow Orchestrator. On Windows, it uses the Windows Task Scheduler; on UNIX, it uses cron or crontab.
 
Job Flow Orchestrator is a stand-alone program that manages the flow orchestration. It is invoked by the OS scheduler (as configured by the OS Trigger) with the id of the flow to execute, then it connects to the SASJFS service to read the flow information, the job execution configuration and the credentials to connect to the grid. With that information, it sends jobs for execution to the SAS Workload Orchestrator. Finally, it is responsible for providing the history record for the flow back to SASJFS service.

Additional components

SAS Grid Manager provides additional components to administer the SAS Workload Orchestrator:
 
• SAS Workload Orchestrator Web Interface
 
• SAS Workload Orchestrator Administration Utility
 
Both can monitor jobs, queues, hosts, services, and logs, and configure hosts, queues, services, user groups, and user resources.
 
The SAS Workload Orchestrator Web Interface is a web application hosted by the SAS Workload Orchestrator process on the grid master host; it can be proxied by the SAS Web Server to always point to the current master in case of failover.

The SAS Workload Orchestrator Administration Utility is an administration command-line interface; it has a similar syntax to SAS Viya CLIs and is located in the directory /Lev1/Applications/GridAdminUtility. A sample invocation to list all running jobs is:
sas-grid-cli show-jobs –state RUNNING

What has not changed

Describing what has not changed with the new grid provider is an easy task: everything else.
 
Obviously, this is a very generic statement, so let’s call out a few noteworthy items that have not changed:
 
• User experience is unchanged. SAS programming interfaces to grid have not changed, apart from the lower-level libraries to connect to the new provider. As such, you still have the traditional SAS grid control server, SAS grid nodes, SAS thin client (aka SASGSUB) and the full SAS client (SAS Display Manager). Users can submit jobs or start grid-launched sessions from SAS Enterprise Guide, SAS Studio, SAS Enterprise Miner, etc.
 
• A directory shared among all grid hosts is still required to share the grid configuration files.
 
• A high-performance, clustered file system for the SASWORK area and for data libraries is mandatory to guarantee satisfactory performance.

What about SAS Grid Manager for Platform?

The traditional grid provider, now rebranded as SAS Grid Manager for Platform, has seen some changes as well with SAS 9.4M6:
 
• The existing management interface, SAS Grid Manager for Platform Module for SAS Environment Manager, has been completely re-designed. The user interface has completely changed, although the functions provided remain the same.
 
• Grid Management Services (GMS) is not updated to work with the latest release of LSF. Therefore, the SAS Grid Manager plug-in for SAS Management Console is no longer supported. However, the plug-in is included with SAS 9.4M6 if you want to upgrade to SAS 9.4M6 without also upgrading Platform Suite for SAS.

You can find more comprehensive information in these doc pages:
 
What’s New in SAS Grid Manager 9.4
 
• Grid Computing for SAS Using SAS Grid Manager (Part 2) section of Grid Computing in SAS 9.4

Native scheduler, new types of workloads, and more: introducing the new SAS Grid Manager was published on SAS Users.

This post was kindly contributed by SAS Users - go there to comment and to read the full post.