Tag: Big Data

NOTE: SAS "Inside" of Hadoop

We previously looked at SAS Grid Manager for Hadoop, which brings workload management, accelerated processing, and scheduling to a Hadoop environment. This was introduced with the m3 maintenance release of SAS v9.4. M3 also introduced support for using…

NOTE: SAS Grid Manager for Hadoop

I’ve recently written about how much new functionality is getting released by SAS on an almost monthly basis without much fanfare, and I’ve also written about how Hadoop is becoming a new “operating system” and we should expect to see Grid and LASR run…

Jedi SAS Tricks – Maximum Warp with Hadoop

I’m gearing up to teach the next “DS2 Programming Essentials with Hadoop” class, and thinking about Warp Speed DATA Steps with DS2 where I first demonstrated parallel processing using threads in base SAS. But how about DATA step processing at maximum warp? For that, we’ll need a massively parallel processing […]

The post Jedi SAS Tricks – Maximum Warp with Hadoop appeared first on The SAS Training Post.

Hadoop is the New Black

It feels like any SAS-related project in 2015 not using Hadoop is simply not ambitious enough. The key question seems to be “how big should our Hadoop cluster be” rather than “do we need a Hadoop cluster”.

Of course, I’m exaggerating, not every project needs to use Hadoop, but there is an element of new thinking required when you consider what data sources are available to your next project and what value would they add to your end goal. Internal and external data sources are easier to acquire, and volume is less and less of an issue (or, stated another way, you can realistically aim to acquire large and larger data sources if they will add value to your enterprise).

Whilst SAS is busy moving clients from PC to web, there’s a lot of work being done by SAS to move the capabilities of the SAS server inside of Hadoop. And that’s to minimise “data miles” by moving the code to the data rather than vice-versa. It surely won’t be long before we see SAS Grid and LASR running inside of Hadoop. It’s almost like Hadoop has become a new operating system on which all of our server-side capabilities must be available.

We tend to think of Hadoop as being a central destination for data but it doesn’t always start its presence in an organisation in that way. Hadoop may enter an organisation for a specific use case, but data attracts data, and so once in the door Hadoop tends to become a centre of gravity. This effect is caused in no small part by the appeal of big data being not just about the data size, but the agility it brings to an organisation.

SAS’s Senior Director of the EMEA and AP Analytical Platform Centre of Excellence, Mark Torr (that’s one heck of a title Mark!) recently wrote a well-founded article on the four levels of Hadoop adoption maturity based upon his experiences with many SAS customers. His experiences chime with my far more limited observations. Mark lists the four levels as:

  1. Monitoring – enterprises that don’t yet see a use for Hadoop within their organisation, or are focused on other priorities
  2. Investigating – those at this level have no clear, focused use for Hadoop but they are open to the idea that it could bring value and hence they are experimenting to see where and how it can deliver benefit(s)
  3. Implementing – the first one or two Hadoop projects are the riskiest because there’s little or no in-house experience, and maybe even some negative political undercurrents too. As Mark notes, the exit from Investigating into Implementing often marks the point where enterprises choose to move from the Apache distribution to a commercial distribution that offers more industrial-strength capabilities such as Hortonworks, Cloudera or MapR
  4. Established – At this level, Hadoop has become a strategic architectural tool for organisations and, given the relative immaturity of Hadoop, the organisations are working with their vendors to influence development towards full production-strength capabilities
Hadoop is (or will be) a journey for all of us. Many organisations are just starting to kick the tyres. Of those who are using Hadoop, most are in the early stages of this process in level 2, with a few front-runners living at level 3. Those organisations at leve 3 are typically big enough to face and invest in solutions to the challenges that the vendors haven’t yet stepped up to, such as managing provenance, data discovery and fine-grained security.

Does anybody live the dream fully yet? Arguably, yes, the internal infrastructures developed at Google and Facebook certainly provide their developers with the advantages and agility of the data lake dream. For most us, we must be content to continue our journey…


Follow me on Twitter: @aratcliffeuk

Jedi SAS Tricks: Warp Speed DATA Steps with DS2

I remember the first time I was faced with the challenge of parallelizing a DATA step process. It was 2001 and SAS V8.1 was shiny and new. We were processing very large data sets, and the computations performed on each record were quite complex. The processing was crawling along on […]

The post Jedi SAS Tricks: Warp Speed DATA Steps with DS2 appeared first on The SAS Training Post.

UK General Election 2015: using PROC MAPIMPORT to visualise the election

Election fever has hit the United Kingdom as the days count down to 7th May 2015.  This is likely to be one of the most uncertain elections in recent memory, with nearly 10 parties struggling for votes across England, Scotland, Wales and Northern Ireland.  Results night will be tense, with the different […]

The post UK General Election 2015: using PROC MAPIMPORT to visualise the election appeared first on The SAS Training Post.

Making sense of text analytics

With all of us blogging, emailing and posting updates on social media, the amount of textual data is growing fast. So how do you make sense of all that big data? Dr. Goutam Chakraborty teaches a Business Knowledge Series course, Text Analytics and Sent…

What’s the big idea? Big graphs, for bigger data!

Not everyone agrees on a definition of “big data” — but you’ll probably agree that the amount of data available today is a lot bigger than in the past, eh?!? … so let’s just call it “Bigger Data”!  🙂 And you might have noticed that some of your ol…