Little known secrets of DO-loops with index variables

This post was kindly contributed by SAS Users - go there to comment and to read the full post.

Iterative loops are one of the most powerful and imperative features of any programming language, allowing blocks of code to be automatically executed repeatedly with some variations. In SAS we call them DO-loops because they are defined by the iterative DO statements. These statements come in three distinct forms:

DO with index variable
DO UNTIL
DO WHILE

In this blog post we will focus on the versatile iterative DO loops with index variable pertaining to SAS DATA steps, as opposed to its modest IML’s DO loops subset.

Iterative DO statement with index variable

The syntax of the DATA step’s iterative DO statement with index variable is remarkably simple yet powerful:

DO statement with index-variable

DO index-variable=specification-1 <, …specification-n>;

…more SAS statements…

END;

It executes a block of code between the DO and END statements repeatedly, controlled by the value of an index variable. Given that angle brackets (< and >) denote “optional”, notice how index-variable requires at least one specification (specification-1) yet allows for multiple additional optional specifications (<, …specification-n>) separated by commas.

Now, let’s look into the DO statement’s index-variable specifications.

Index-variable specification

Each specification denotes an expression, or a series of expressions as follows:

start-expression <TO stop-expression> <BY increment-expression> <WHILE (expression) | UNTIL (expression)>

Note that only start-expression is required here whereas <TO stop-expression>, <BY increment-expression>, and <WHILE (expression) or UNTIL (expression)> are optional.

Start-expression may be of either Numeric or Character type, while stop-expression and increment-expression may only be Numeric complementing Numeric start-expression.

Expressions in <WHILE (expression) | UNTIL (expression)> are Boolean Numeric expressions (numeric value other than 0 or missing is TRUE and a value of 0 or missing is FALSE).

Other iterative DO statements

For comparison, here is a brief description of the other two forms of iterative DO statement:

The DO UNTIL statement executes statements in a DO loop repetitively until a condition is true, checking the condition after each iteration of the DO loop. In other words, if the condition is true at the end of the current loop it will not iterate anymore, and processing continues with the next statement after END. Otherwise, it will iterate.
The DO WHILE statement executes statements in a DO loop repetitively while a condition is true, checking the condition before each iteration of the DO loop. That is if the condition is true at the beginning of the current loop it will iterate, otherwise it will not, and processing continues with the next statement after the END.

Looping over a list of index variable values/expressions

DO loops can iterate over a list of index variable values. For example, the following DO-loop will iterate its index variable values over a list of 7, 13, 5, 1 in the order they are specified:

data A; 
   do i=7, 13, 5, 1;
      put i=;
      output;
   end;
run;

This is not yet another form of iterative DO loop as it is fully covered by the iterative DO statement with index variable definition. In this case, the first value (7) is the required start expression of the required first specification, and all subsequent values (13, 5 and 1) are required start expressions of the additional optional specifications.

Similarly, the following example illustrates looping over a list of index variable character values:

data A1;
   length j $4;
   do j='a', 'bcd', 'efgh', 'xyz';
      put j=;
      output;
   end;
run;

Since DO loop specifications denote expressions (values are just instances or subsets of expressions), we can expand our example to a list of actual expressions:

data B;
   p = constant('pi');
   do i=round(sin(p)), sin(p/2), sin(p/3);
      put i=;
      output;
   end;
run;

In this code DO-loop will iterate its index variable over a list of values defined by the following expressions: round(sin(p)), sin(p/2), sin(p/3).

Infinite loops

Since <TO stop> is optional for the index-variable specification, the following code is perfectly syntactically correct:

data C;
   do j=1 by 1;
      output;
   end;
run;

It will result in an infinite (endless) loop in which resulting data set will be growing indefinitely.

While unintentional infinite looping is considered to be a bug and programmers’ anathema, sometimes it may be used intentionally. For example, to find out what happens when data set size reaches the disk space capacity… Or instead of supplying a “big enough” hard-coded number (which is not a good programming practice) for the loop’s TO expression, we may want to define an infinite DO-loop and take care of its termination and exit inside the loop. For example, you can use IF exit-condition THEN LEAVE; or IF exit-condition THEN STOP; construct.

LEAVE statement immediately stops processing the current DO-loop and resumes with the next statement after its END.

STOP statement immediately stops execution of the current DATA step and SAS resumes processing statements after the end of the current DATA step.

The exit-condition may be unrelated to the index-variable and be based on some events occurrence. For instance, the following code will continue running syntactically “infinite” loop, but the IF-THEN-LEAVE statement will limit it to 200 seconds:

data D;
   start = datetime();
   do k=1 by 1;
      if datetime()-start gt 200 then leave;
      /* ... some processing ...*/
      output; 
   end;
run;

You can also create endless loop using DO UNTIL(0); or DO WHILE(1); statement, but again you would need to take care of its termination inside the loop.

Changing “TO stop” within DO-loop will not affect the number of iterations

If you think you can break out of your DO loop prematurely by adjusting TO stop expression value from within the loop, you may want to run the following code snippet to prove to yourself it’s not going to happen:

data E;
   n = 4;
   do i=1 to n;
      put i=;
      output;
      if i eq 2 then n = 2;
   end;
run;

This code will execute DO-loop 4 times despite that you change value of n from 4 to 2 within the loop.

According to the iterative DO statement documentation, any changes to stop made within the DO group do not affect the number of iterations. Instead, in order to stop iteration of DO-loop before index variable surpasses stop, change the value of index-variable so that it becomes equal to the value of stop, or use LEAVE statement to jump out of the loop. The following two examples will do just that:

data F;
   do i=1 to 4;
      put i=;
      output;
      if i eq 2 then i = 4;
   end;
run;
 
data G;
   do i=1 to 4;
      put i=;
      output;
      if i eq 2 then leave;
   end;
run;

Know thy DO-loop specifications

Here is a little attention/comprehension test for you.

How many times will the following DO-loop iterate?

data H;
   do i=1, 7, 3, 6, 2 until (i>3);
      put i=;
      output;
   end;
run;

If your answer is 2, you need to re-read the whole post from the beginning (I am only partly joking here).

You may easily find out the correct answer by running this code snippet in SAS. If you are surprised by the result, just take a closer look at the DO statement: there are 5 specifications for the index variable here (separated by commas) whereas UNTIL (expression) belongs to the last specification where i=2. Thus, UNTIL only applies to a single value of i=2 (not to any previous specifications of i =1,7,3,6); therefore, it has no effect as it is evaluated at the end of each iteration.

Now consider the following DO-loop definition:

data Z;
   pi = constant('pi');
   do x=3 while(x>pi), 10 to 1 by -pi*3, 20, 30 to 35 until(pi);
      put x=;
      output;
   end;
run;

I hope after reading this blog post you can easily identify the index variable list of values the DO-loop will iterate over. Feel free to share your solution and explanation in the comments section below.

Additional resources

The Magnificent DO (SGF paper, by Paul M. Dorfman)
Loops in SAS (blog post, by Rick Wicklin)
Data-driven SAS macro loops (blog post, by Leonid Batkhan)

Questions? Thoughts? Comments?

Do you find this post useful? Do you have questions, other secrets, tips or tricks about the DO loop? Please share with us below.

Little known secrets of DO-loops with index variables was published on SAS Users.