Storage in the Cloud – SAS and Azure

This post was kindly contributed by SAS Users - go there to comment and to read the full post.

Editor’s note: This is the third article in a series by Conor Hogan, a Solutions Architect at SAS, on SAS and database and storage options on cloud technologies. This article covers the SAS offerings available to connect to and interact with the various storage options available in Microsoft Azure. Access all the articles in the series here.

In this edition of the series on SAS and cloud integration, I cover the various storage options available on Microsoft Azure and how connect to and interact with them. I focus on three key storage services: object storage, block storage, and file storage. In my previous articles I have covered topics regarding database as a service (DBaaS) and storage offerings from Amazon Web Services (AWS) as well as DBaaS on Azure.

Object Storage

Azure Blob Storage is a low-cost, scalable cloud object storage service for any type of data. Objects are a great way to store large amounts of unstructured data in their native formats. Individual Azure Blob objects size up to 4.75 terabytes (TB). Azure organizes these objects into different storage accounts. Because a storage account is a globally unique namespace for your data, no two storage accounts can have the same name. The storage account supplies a unique namespace for your data and is accessible from anywhere in the world over HTTP or HTTPS.

A Container organizes a set of Blobs similar to a traditional directory in a file system. You access Azure Blobs directly through an API from anywhere in the world. For security reasons, it is vital to grant least access to a Blob.

Make sure you are being intentional about opening objects up and are not exposing any sensitive data. Security controls are offered within individual blobs and containers that organize them. The default is to create objects and blobs with no public read access, then you may grant permissions to individual users and groups.

The total cost of blob storage depends on volume of data stored, type of operations performed, data transfer costs, and data redundancy choices. You can reduce the number of replicants or use one of the various tiers of archive services to reduce the cost of your object storage. Terabytes of storage used per month determine the calculations on cost. You incur added costs for data requests and transfers over the network. Data movement is an unpredictable expense for many users.

Azure Blob Storage Tiers
Hot Frequently accessed data
Cool Infrequently accessed data – archived at least 30 days
Archive Rarely accessed data – archived at least 180 days

 
In SAS Viya 3.5, direct support is available for objects stored in Azure Data Lake Storage Gen2. Azure Data Lake Storage Gen2 extends Azure Blob Storage capabilities and optimizing it for analytics workloads. If you want to read any SAS datasets, CSV and ORC files from Azure Blob Storage, you can read them directly using a CASLIB statement to Azure Data Lake Storage (ADLS). If you have files in a different format, you can always copy them to a local file system accessible to the CAS controller. Use CAS Actions to load tables into memory. Making HTTP requests directly from within your SAS code using Proc HTTP for the download process favors automation. Remember, no restrictions exist on file types for objects moving into object storage. Hence, this may require a SAS Data Connector to read some local file system filetypes.

Block Storage

Auzre Disks is the block storage service designed for use with Azure Virtual Machines. You may only access block storage when attached to an operating system. When thinking about Azure Disks, treat the storage volumes as an independent disk drive controlled by the server operating system. Mount an Azure Disk to an operating system as if it were a physical disk. Azure Disks are valuable because they are the persisting storage when you terminate your compute instance. You can choose from four different volume types that supply performance levels at corresponding costs.

Azure makes available a choice from HDD or three different performance classes of SSD: Standard, Premium, and Ultra performance. You can use Ultra Disk if you need the lowest latency and scalable performance. Standard SDD is the most cost effective while Premium SSD is the high-performance disk offering. The table below sums up four offerings.

Azure Disk Storage Types
Standard HDD Standard SDD Premium SDD Ultra SDD
Backups and Non critical Development or Test environments and lightly used workloads Production environments and time sensitive workloads High throughput and IOPS – Transaction heavy workloads

 
Azure Disks are the permanent SAS data storage, persisting through a restart of your SAS environment. The disk performance used when selecting from the different Azure Disk type has a direct impact on the performance you get from SAS. A best practice is to use compute instances with enhanced Azure Disks performance or dedicated solid state drive instance storage.

File Storage

Azure Files provides access to data through a shared file system. The elastic network file system grows and shrinks as you add or remove files, so you only pay for the storage you consume. Users create, delete, modify, read, and write files organized logically in a directory structure for intuitive access. This service allows simultaneous access for multiple users to a common set of files data managed by user and group permissions.

Azure Files is a powerful tool, especially if utilizing a SAS Grid architecture. If you have a requirement in your SAS architecture for a shared location where any node in a group can access and write to, then Azure Files could meet your requirement. To access the data stored in your network file system you will have to mount the file system to your operating system. You can mount Azure Files to any Azure Virtual Machine, or even to an on-premise server within your Azure Virtual Network. Azure Files is a fantastic way to setup a shared file system not only for your data but also to share projects and code between users.

Finally

Storage is a key component of cloud computing because it enables users to stop their compute instances while their most important data remains in place. Storage services make it much easier to manage and scale your data. For example, Blob storage is a great place to store files that you want to make available to anyone, anywhere.

Block storage drives the performance of your environment. Abundant and performant block storage is essential to making your application run against the massive scale of data that SAS is eager to consume. Block storage is where your operating system and software ultimately are installed.

File storage is a great service to attach shared file systems to your compute instances. This is a great place to collaborate or migrate file system data from one compute instance to another. SAS is a compute engine running on data.

Without a robust set up storage tools to persist that data you may not get the performance that you desire or the progress you make will be lost when you shut down your compute instances.

Resources

Storage in the Cloud – SAS and Azure was published on SAS Users.

This post was kindly contributed by SAS Users - go there to comment and to read the full post.