Introducing TRIMS function to remove any leading and/or trailing characters from SAS strings

This post was kindly contributed by SAS Users - go there to comment and to read the full post.

Trimming strings left and right

I am pretty sure you have never heard of the TRIMS function, and I would be genuinely surprised if you told me otherwise. This is because this function does not exist (at least at the time of this writing).

But don’t worry, the difference between “nonexistence” and “existence” is only a matter of time, and from now it is less than a blog away. Let me explain. Recently, I published two complementary blog posts:

[1] Removing leading characters from SAS strings

[2] Removing trailing characters from SAS strings

While working on these pieces and researching “prior art” I stumbled upon a multipurpose function in the SAS FedSQL Language that alone does either one or both of these things – remove leading or/and trailing characters from SAS strings.

FedSQL Language and Proc FedSQL

The FedSQL language is the SAS proprietary implementation of the ANSI SQL:1999 core standard. Expectedly, the FedSQL language is implemented in SAS by means of the FedSQL procedure (PROC FEDSQL). This procedure enables you to submit FedSQL language statements from a Base SAS session, and it is supported in both SAS 9.4 and SAS Viya.

Using the FEDSQL procedure, you can submit FedSQL language statements to SAS and third-party data sources that are accessed with SAS and SAS/ACCESS library engines. Or, if you have SAS Cloud Analytic Services (CAS) configured, you can submit FedSQL language statements to the CAS server.

FedSQL TRIM function

FedSQL language has its own vast FedSQL Functions library with hundreds of functions many of which replicate SAS 9.4 Functions. Many, but not all. Deep inside this FedSQL functions library, there is a unique treasure modestly called TRIM Function which is quite different from the BASE SAS Language TRIM() function.

While SAS 9.4 BASE TRIM() function capabilities are quite limited – it removes just trailing blanks from a character string, the FedSQL TRIM() function is way much more powerful. This triple-action function can remove not just trailing blanks, but also leading blanks, as well as both, leading and trailing blanks. On top of it, it can remove not just blanks, but any characters (although one character at a time). See for yourself, this function has the following pretty self-explanatory syntax:

TRIM( [BOTH | LEADING | TRAILING] [trim-character] FROM column)

Here trim-character specifies one character (in single quotations marks) to remove from column. If trim-character is not specified, the function removes blanks.

While being called a function, it does not look like a regular SAS function where arguments are separated by commas.  It looks more like an SQL statement (which it understandably is – it is part of the FedSQL language). However, this function is available only in PROC FEDSQL; it’s not available in SAS DATA steps or other PROC steps. Still, it gives us pretty good idea of what such a universal function may look like.

User-defined function TRIMS to remove leading or/and trailing characters in SAS strings

Let’s build such a function by means of the PROC FCMP for the outside the FedSQL usage (it is worth noticing that the FCMP procedure is not supported for FedSQL). To avoid confusion with the existing TRIM function we will call our new function TRIMS (with an ‘S’ at the end) which suits our purpose quite well denoting its plural purpose. First, we define what we are going to create.

User-defined TRIMS function

TRIMS Function

Removes leading characters, trailing characters, or both from a character string.

Syntax

TRIMS(function-modifier, string, trim-list, trim-list-modifier)

Required Arguments

  • function-modifier is a case-insensitive character constant, variable, or expression that specifies one of three possible operations:
    ‘L’ or ‘l’ – removes leading characters.
    ‘T’ or ‘t’ – removes trailing characters.
    ‘B’ or ‘b’ – removes both, leading and trailing characters.
  • string is a case-sensitive character constant, variable, or expression that specifies the character string to be trimmed.
  • trim-list is a case-sensitive character constant, variable, or expression that specifies character(s) to remove from the string.
  • trim-list-modifier is a case-insensitive character constant variable, or expression that supplements the trim-list.
    The valid values are those modifiers of the FINDC function that “add” groups of characters (e.g. ‘a’ or ‘A’, ‘c’ or ‘C’, ‘d’ or ‘D’, etc.) to the trim-list.

The following user-defined function implementation is based on the coding techniques described in the two previous posts, [1] and [2] that I mentioned above. Here goes.

 
libname funclib 'c:\projects\functions';
 
/* delete previous function definition during debugging */
options cmplib=funclib.userfuncs;
proc fcmp outlib=funclib.userfuncs.package1;
   deletefunc trims;
run;
 
/* new function defintion */
proc fcmp outlib=funclib.userfuncs.package1;
   function trims(f $, str $, clist $, mod $) $32767;
      from = 1;
      last = length(str);
      if upcase(f) in ('L', 'B') then from = findc(str, clist, 'K'||mod);
      if from=0 then return('');
      if upcase(f) in ('T', 'B') then last = findc(str, clist, 'K'||mod, -last); 
      if last=0 then return('');
      return(substr(str, from, last-from+1));      
   endfunc; 
run;

Code highlights

  • In the function definition, we first assign initial values of the target substring positions as from=1 and last=length(str).
  • Then for Leading or Both character removal, we calculate an adjusted value of from as a position of the first character in str that is not listed in clist and not defined by the mod
  • If from=0 then we return blank and stop further calculations as this means that ALL characters are to be removed.
  • Then for Trailing or Both character removal, we calculate an adjusted value of last as a position of the last character in str that is not listed in clist and not defined by the mod
  • If last=0 then we return blank and stop further calculations as this means that ALL characters are to be removed.
  • And finally, we return a substring of str starting at the from position and ending at the last position, that is with the length of last-from+1.

TRIMS function usage

Let’s define SAS data set SOURCE as follows:

data SOURCE;
   input X $ 1-30;
   datalines;
*00It's done*2*1**-
*--*1****9*55
94*Clean record-*00
;

In the following DATA step, we will create three new variables with removed leading (variable XL), trailing (variable XT) and both – leading and trailing (variable XB) characters '*' and '-' as well as any digits:

options cmplib=funclib.userfuncs;
data TARGET;
   set SOURCE;
   length XB XL XT $30;
   XB = trims('b', X, '*-', 'd');
   XL = trims('L', X, '*-', 'd');
   XT = trims('t', X, '*-', 'd');
run;

In this code we use the TRIM function three times, each time with a different first argument to illustrate how this affects the outcome.

Arguments usage highlights

  • The first argument of the TRIMS function specifies whether we remove characters from both leading and trailing positions ('b'), from leading positions only ('L'), or from trailing positions only ('t'). This argument is case-insensitive. (I prefer using capital 'L' for clarity since lowercase 'l' looks like digit '1').
  • The second argument specifies the name of the variable (X) that we are going to remove characters from (variable X is coming from the dataset SOURCE).
  • The third argument '*-' specifies which character (or characters) to remove. In our example we are removing '*' and '-'. If you do not need to explicitly specify any character here, you still must supply a null value ('') since it is a required argument. In this case, the fourth argument (trim-list-modifier) will determine the set of characters to be removed.
  • And finally, the fourth argument (case-insensitive) of the TRIMS function specifies the FINDC function modifier(s) to remove certain characters in bulk (in our example 'd' will remove all digits). If such modifier is not needed, you still must supply a null value ('') since all four arguments of the TRIMS function are positional and required.

Here is the output data table TARGRET showing the original string X and the resulting strings XB (Both leading and trailing characters removed), XL (Leading characters removed) and XT (Trailing characters removed) side by side:

Result of leading and trailing characters trimming

Conclusion

The new TRIMS function presented in this blog post goes ways further the ubiquitous LEFT and TRIM functions that remove the leading (LEFT) or trailing (TRIM) blanks. The TRIMS function handles ANY characters, not just blanks. It also expands the character deletion functionality of the powerful  FedSQL TRIM function beyond just removing any single leading and/or trailing character. The TRIMS function single-handedly removes any number of explicitly specified characters from either leading, trailing or both (leading and trailing) positions. Plus, it removes in bulk many implicitly specified characters. For example 'd' modifier removes all digits, 'du' modifier removes all digits ('d') and all uppercase letters ('u'), 'dup' modifier removes all digits ('d'), all uppercase letters ('u') and all punctuation marks ('p'); and so on as described by the FINDC function modifiers. The order in which modifier characters are listed does not matter.

Additional resources

Questions? Thoughts? Comments?

Do you find this post useful? Please share your thoughts with us below.

Introducing TRIMS function to remove any leading and/or trailing characters from SAS strings was published on SAS Users.

This post was kindly contributed by SAS Users - go there to comment and to read the full post.