Accessing Google Cloud Storage (GCS) with SAS Viya

November 18, 2020
By

This post was kindly contributed by SAS Users - go there to comment and to read the full post.

SAS loves data. It’s our raison d’être. We’ve been dealing with Big Data long before the term was first used in 2005. A brief history of Big Data*:

  • In 1887, Herman Hollerith invented punch cards and a reader to organize census data.
  • In 1937, the US government had a punch-card reading machine created to keep track of 26 M Americans and 3 M employers as a result of the Social Security Act.
  • In 1943, Colossus was created to decipher Nazi codes during World War II.
  • In 1952, the National Security Agency was created to confront decrypting intelligence signals during the Cold War.
  • In 1965, the US Government built the first data center to store 742 M tax returns and 175 M sets of fingerprints.
  • In 1989, British computer scientist Tim Berners-Lee coined the phrase “World Wide Web” combining hypertext with the Internet.
  • In 1995, the first super-computer is built.
  • In 2005 Roger Mougalas from O’Reilly Media coined the term Big Data.
  • In 2006, Hadoop is created.

From

To

The story goes on to the tune of 90 percent of available data today has been created in the last two years!

As SAS (and the computing world) moves to the cloud, the question of, “How do I deal with my data (Big and otherwise), which used to be on-prem, in the cloud?” is at the forefront of many organizations. I ran across a series of relevant articles by my colleague, Nicolas Robert, on the SAS Support Communities on SAS and data access and storage on Google Cloud Storage (GCS). This post organizes the articles so you can quickly get an overview of the various options for SAS to access data in GCS.

Accessing Google Cloud Storage (GCS) with SAS Viya 3.5 – An overview

As the title suggests, this is an overview of the series. Some basic SAS terminology and capabilities are discussed, followed by an overview of GCS data options for SAS. Options include:

  • gsutil – the “indirect” way
  • REST API – the “web” way
  • gcsfuse – the “dark” way
  • BigQuery – the “smart” way.

In the overview Nicolas provides the pros and cons of each offering to help you decide which option works best for your situation. Below is a list of subsequent articles providing technical details, specific steps for usage, and sample code for each option.

Accessing files on Google Cloud Storage (GCS) using REST

The Google Cloud Platform (GCP) provides an API for manipulating objects in Google Cloud Storage. In this article, Nicolas provides step-by-step instructions on using this API to access GCS files from SAS.

Accessing files on Google Cloud Storage (GCS) using SAS Viya 3.5 and Cloud Storage FUSE (gcsfuse)

Cloud Storage FUSE provides a command-line utility, named “gcsfuse”, which helps you mount a GCS bucket to a local directory so the bucket’s contents are visible and accessible locally like any other file. In this article, Nicolas presents rules for CLI usage, options for mounting a GCS bucket to a local directory, and SAS code for accessing the data.

SAS Viya 3.5 and Google Cloud Storage (GCS) Performance Feedback

In this article, Nicolas provides the results of a performance test of GCS integrated with SAS when accessed from cloud instances. New releases of SAS will only help facilitate integration and improve performance.

Accessing files on Google Cloud Storage (GCS) through Google BigQuery

Google BigQuery naturally interacts with Google Cloud Storage using popular big data file formats (Avro, Parquet, ORC) as well as commodity file formats like CSV and JSON. And since SAS can access Google BigQuery, SAS can access those GCS resources under the covers. In the final article, Nicolas debunks the myth that using Google BigQuery as middleware between SAS and GCS is cumbersome, not direct and requires data duplication.

Finally

Being able to access a wide variety of data on the major cloud providers’ object storage technologies has become essential if not already mandatory. I encourage you to browse through the various articles, find your specific area of interest, and try out some of the detailed concepts.

* Big Data history compiled from A Short History Of Big Data, by Dr Mark van Rijmenam.

Accessing Google Cloud Storage (GCS) with SAS Viya was published on SAS Users.

This post was kindly contributed by SAS Users - go there to comment and to read the full post.

Tags: , , , ,

Welcome!

SAS-X.com offers news and tutorials about the various SAS® software packages, contributed by bloggers. You are welcome to subscribe to e-mail updates, or add your SAS-blog to the site.

Sponsors







Dear readers, proc-x is looking for sponsors who would be willing to support the site in exchange for banner ads in the right sidebar of the site. If you are interested, please e-mail me at: tal.galili@gmail.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.