Tag: SAS

Practical MD5 in SAS

This guide introduces MD5 and hash functions in general, lists common uses for hash functions, gives advise on how to best use MD5 in SAS, and covers common issues.

NOTE: Enterprise Guide vs DI Studio – What’s the difference?

A favourite interview question of mine is: Compare and contrast SAS 9’s stored process server and workspace server. This question is very good at revealing whether candidates actually understand some of what’s going on behind the scenes of SAS 9. I mentioned this back in 2010, together with some notes on my expectations for an answer.

I was amused to see Michelle Homes post another of my favourite interview questions on the BI Notes blog recently: What’s the difference between SAS Enterprise Guide and SAS DI Studio? This question, and the ensuing conversation, establishes whether the candidate has used either or both of the tools, and it reveals how much the candidate is thinking about their environment and the tools within.

For me, there are two key differences: metadata, and primary use.

Michelle focuses on the former and gives a very good run-down of the use of metadata in Data Intergration Studio (and the little use in Enteprise Guide).

With regards to primary use, take a look at the visual nodes available in the two tools. The nodes in DI Studio are focused upon data extraction, transformation and loading (as you would expect), whilst the nodes in Enterprise Guide (EG) are focused upon analysing data. Sure, EG has nodes for sorting, transposing and other data-related activities (including SQL queries), but the data manipulation nodes are not as extensive as DI Studio. In addition to sorting and transposing, DI Studio offers nodes that understand data models, e.g. an SCD loader and a surrogate key generator (I described slowly changing dimensions (SCDs) and other elements of star schema data models in a post in 2009). On the other hand, EG has lots of nodes for tabulating, graphing, charting, analysing, and modelling your data.

One final distinction I’d draw is that EG’s nodes are each based around one SAS procedure, whilst DI’s nodes are based around an ETL technique or requirement. You can see that DI Studio was produced for a specific purpose, whilst EG was produced as a user friendly layer to put on top of the SAS language and thereby offers a more generalistic solution.

For the most part, I’m stating the obvious above, but the interview candidate’s answer to the question provides a great deal of insight into their approach to their work, their sense of curiosity and awareness, and their technical insight.


Follow me on Twitter: @aratcliffeuk

See an audiovisual recording on my SAS Global Forum 2013 paper Visual Techniques for Problem Solving and Debugging

Email address normalization in SAS

This SAS macro performs email address normalization by changing email addresses like First.Last+tag@googlemail.com to the canonical form firstlast@gmail.com. Also, it demonstrates basic unit testing in SAS, which ensures quality and eases code mainten…

NOTE: Tips to Avoid the Bus

Back in 2011 I wrote about the Bus Factor, i.e. the minimum number of people on your project (or in your support team) whose loss would cause serious issues for your project/support team. The name of this factor derives from the possibility of one or more team members getting hit by a bus. An alternative (less tragic) name – highlighted by Angela Hall at the time – is “lottery factor”, i.e. we assume that one or more people got a big win on the lottery and immediately left work, never to return. Either way, it’s a serious factor and must be managed.

At the time, I offered a number of techniques to help increase your team’s bus factor (a good thing). Here are a few more that I use, all focused on the greater sharing of knowledge. If you ingrain the techniques of active and deliberate knowledge sharing into your team members then you need worry less about your bus factor, but don’t completely take your eye off the ball – remember to manage it.

Push-Based Knowledge Sharing. The person who holds the knowledge about something asks a person who does not know about it to join them to learn about it. They thereby PUSH the information towards the other person.

Pull-Based Knowledge Sharing. The person who does not have knowledge about something asks another person who knows about it to teach them about it in some way. In this way, they establish a PULL of the information from the other person.

Knowledge-Share Handshaking. Having only a single direction knowledge sharing culture, i.e. only pull or only push, is not the most effective culture. There has to be a knowledge handshake for knowledge to freely flow through. Encompassed within handshaking is the idea of pairing. One of the best ways to remove bus factors, is by pairing. Pairing is an act of implicit learning where knowledge is constantly back and forth. On the other hand, if a person asks a question “How did you do that?” then that is an act of explicit learning.

Pairing is hard to achieve in organisations where pairing was never a “thing” people do. If you cannot get enough people to pair, or the bus factor is happening when a person from a different team knows something that your team replies on, it’s time to start encouraging implicit knowledge gathering, or implicit learning.


Follow me on Twitter: @aratcliffeuk

See an audiovisual recording on my SAS Global Forum 2013 paper Visual Techniques for Problem Solving and Debugging

NOTE: Advent Calendar 2013

I bring good news! The SAS Professionals advent calendar is now working nicely. Open a new window each day to stand a chance of winning great prizes.

Follow me on Twitter: @aratcliffeuk

See an audiovisual recording on my SAS Global Forum 2013 paper …

NOTE: Whither the Advent Calendar?

It’s traditional for me to mention the SAS Professionals advent calendar at this time of year. However, this year it seems to have stalled. Clicking the #1 today tells me that I need to wait for the correct date.I’ll post an update as soon as I have mo…

The Hardware and Software 2013 I’m Most Thankful For

It’s time of year to give thanks. As a programmer, e-book reader, blog writer and web surfer, I should express my sincere appreciation to such hardware and software (I use majority of them at daily basis and most of them are free): 0. Hardware Lenovo Thinkpad W520 (This is not free): my workhorse machine, now replaced […]

More on Technical Debt #2/2

Last week I offered some techniques for management of technical debt. In this post I offer some more.Technical debt is a debt that you incur every time you avoid doing the right thing (like refactoring, removing duplication/redundancy), thereby letting…