Learn to Love SAS LIKE

This post was kindly contributed by SAS Users - go there to comment and to read the full post.

In honor of Valentine’s day, we thought it would be fitting to present an excerpt from a paper about the LIKE operator because when you like something a lot, it may lead to love! If you want more, you can read the full paper “Like, Learn to Love SAS® Like” by Louise Hadden, which won best paper at WUSS 2019.

Introduction

SAS provides numerous time- and angst-saving techniques to make the SAS programmer’s life easier. Among those techniques are the ability to search and select data using SAS functions and operators in the data step and PROC SQL, as well as the ability to join data sets based on matches at various levels. This paper explores how LIKE is featured in each one of these techniques and is suitable for all SAS practitioners. I hope that LIKE will become part of your SAS toolbox, too.

Smooth Operators

SAS operators are used to perform a number of functions: arithmetic calculations, comparing or selecting variable values, or logical operations. Operators are loosely grouped as “prefix” (for example a sign before a variable) or “infix” which generally perform an operation BETWEEN two variables. Arithmetic operations using SAS operators may include exponentiation (**), multiplication (*), and addition (+), among others. Comparison operators may include greater than (>, GT) and equals (=, EQ), among others. Logical, or Boolean, operators include such operands as || or !!, AND, and OR, and serve the purpose of grouping SAS operations. Some operations that are performed by SAS operators have been formalized in functions. A good example of this is the concatenation operators (|| and !!) and the more powerful CAT functions which perform similar, but not identical, operations. LIKE operators are most frequently utilized in the DATA step and PROC SQL via a DATA step.

There is a category of SAS operators that act as comparison operators under special circumstances, generally in where statements in PROC SQL and the data step (and DS2) and subsetting if statements in the data step. These operators include the LIKE operator and the SOUNDS LIKE operator, as well as the CONTAINS and the SAME-AND operators. It is beyond the scope of this short paper to discuss all the smooth operators, but they are definitely worth a look.

LIKE Operator

Character operators are frequently used for “pattern matching,” that is, evaluating whether a variable value equals, does not equal, or sounds like a specified value or pattern. The LIKE operator is a case-sensitive character operator that employs two special “wildcard” characters to specify a pattern: the percent sign (%) indicates any number of characters in a pattern, while the underscore (_) indicates the presence of a single character per underscore in a pattern. The LIKE operator is akin to the GREP utility available on Unix/Linux systems in terms of its ability to search strings.

The LIKE operator also includes an escape routine in case you need to use a string that includes a comparison operator such as the carat, the underscore, or the percent sign, etc. An example of the escape routine syntax, when looking for a string containing a percent sign, is:

where yourvar like ‘100%’ escape ‘%’;

Additionally, SAS practitioners can use the NOT LIKE operator to select variables WITHOUT a given pattern. Please note that the LIKE statement is case-sensitive. You can use the UPCASE, LOWCASE, or PROPCASE functions to adjust input strings prior to using the LIKE statement. You may string multiple LIKE statements together with the AND or OR operators.

SOUNDS LIKE Operator

The LIKE operator, described above, searches the actual spelling of operands to make a comparison. The SOUNDS LIKE operator uses phonetic values to determine whether character strings match a given pattern. As with the LIKE operator, the SOUNDS LIKE operator is useful for when there are misspellings and similar sounding names in strings to be compared. The SOUNDS LIKE operator is denoted with a short cut ‘-*’. SOUNDS LIKE is based on SAS’s SOUNDEX algorithm. Strings are encoded by retaining the original first column, stripping all letters that are or act as vowels (A, E, H, I, O, U, W, Y), and then assigning numbers to groups: 1 includes B, F, P, and V; 2 includes C, G, J, K, Q, S, X, Z; 3 includes D and T; 4 includes L; 5 includes M and N; and 6 includes R. “Tristn” therefore becomes T6235, as does Tristan, Tristen, Tristian, and Tristin.

For more on the SOUNDS LIKE operator, please read the documentation.

Joins with the LIKE Operator

It is possible to select records with the LIKE operator in PROC SQL with a WHERE statement, including with joins. For example, the code below selects records from the SASHELP.ZIPCODE file that are in the state of Massachusetts and are for a city that begins with “SPR”.

proc sql;
    CREATE TABLE TEMP1 AS
    select
        a.City ,
        a.countynm  , a.city2 ,
         a.statename , a.statename2
    from sashelp.zipcode as a
    where upcase(a.city) like 'SPR%' and 
upcase(a.statename)='MASSACHUSETTS' ; 
quit;

The test print of table TEMP1 shows only cases for Springfield, Massachusetts.

The code below joins SASHELP.ZIPCODE and a copy of the same file with a renamed key column (city –> geocity), again selecting records for the join that are in the state of Massachusetts and are for a city that begins with “SPR”.

proc sql;
    CREATE TABLE TEMP2 AS
    select
        a.City , b.geocity, 
        a.countynm  ,
        a.statename , b.statecode, 
        a.x, a.y
    from sashelp.zipcode as a, zipcode2 as b
    where a.city = b.geocity and upcase(a.city) like 'SPR%' and b.statecode
= 'MA' ;
quit;

The test print of table TEMP2 shows only cases for Springfield, Massachusetts with additional variables from the joined file.

The LIKE “Condition”

The LIKE operator is sometimes referred to as a “condition,” generally in reference to character comparisons where the prefix of a string is specified in a search. LIKE “conditions” are restricted to the DATA step because the colon modifier is not supported in PROC SQL. The syntax for the LIKE “condition” is:

where firstname=: ‘Tr’;

This statement would select all first names in Table 2 above. To accomplish the same goal in PROC SQL, the LIKE operator can be used with a trailing % in a where statement.

Conclusion

SAS provides practitioners with several useful techniques using LIKE statements including the smooth LIKE operator/condition in both the DATA step and PROC SQL. There’s definitely reason to like LIKE in SAS programming.

To learn more about SAS Press, check out our up-and-coming titles, and to receive exclusive discounts make sure to subscribe to our newsletter.

References

    Gilsen, Bruce. September 2001. “SAS® Program Efficiency for Beginners.” Proceedings of the Northeast SAS Users Group Conference, Baltimore, MD.

    Roesch, Amanda. September 2011. “Matching Data Using Sounds-Like Operators and SAS® Compare Functions.” Proceedings of the Northeast SAS Users Group Conference, Portland, ME.

    Shankar, Charu. June 2019. “The Shape of SAS® Code.” Proceedings of PharmaSUG 2019 Conference, Philadelphia, PA.

Learn to Love SAS LIKE was published on SAS Users.

This post was kindly contributed by SAS Users - go there to comment and to read the full post.