How to detect SAS data sets that contain (or do not contain) character variables
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |
A SAS programmer posted an interesting question on a SAS discussion forum. The programmer wanted to iterate over hundreds of SAS data sets, read in all the character variables, and then do some analysis. However, not every data set contains character variables, and SAS complains when you ask it to read the character variables in a data set that contains only numeric variables.
The programmer wanted to use PROC IML to solve the problem, but the issue also occurs in the SAS DATA step. The following program creates three data sets. Two of them (AllChar and Mixed) contain at least one character variable. The third data set (AllNum) does not contain any character variables. For the third data set, an error occurs if you try to use the KEEP=_CHARACTER_ data set option, as shown in the following example:
data AllNum; x=1; y=2; z=3; run; data AllChar; A='ABC'; B='XYZW'; run; data Mixed; name='Joe'; sex='M'; Height=1.8; Weight=81; treatment='Placebo'; run; /* try to use DROP=_CHARACTER_ to exclude numeric variables */ data KeepTheChar; set AllNum(keep=_CHARACTER_); /* ERROR when no character variables in the data set */ run; |
ERROR: The variable _CHARACTER_ in the DROP, KEEP, or RENAME list has never been referenced. |
The same problem occurs in PROC IML if you try to read character variables when none exist:
proc iml; use AllNum; read all var _CHAR_ into X; /* ERROR when no character variables in the data set */ close; |
ERROR: No character variables in the data set. |
There are at least two ways to handle this situation:
- In both Base SAS and SAS/IML, you can use dictionary tables to determine in advance which data sets contain at least one character variable. You can then read only those data set.
- In SAS/IML, you can read all variables into a table, then extract the character variables into a matrix for further processing.
Of course, the same ideas apply if you want to read only numeric variables and you encounter a data set that does not contain any numeric variables.
Use DICTIONARY tables to find information about your SAS session
If you have ever been to a SAS conference, you know that DICTIONARY tables are a favorite topic for SAS programmers. DICTIONARY tables are read-only tables that provide information about the state of the SAS session, including libraries, data sets, variables, and system options. You can access them directly by using PROC SQL.
If you want to access the information in the DATA steps or other procedures (like PROC IML), you can use special data views in SASHELP. In particular, the Sashelp.VColumn view provides information about variables in SAS data set and is often used to find data sets that contain certain variable names.
(See the references at the end of this article for more information about DICTIONARY tables.)
The following SAS/IML program uses the Sashelp.VColumn to find out which data sets contain at least one character variable:
proc iml; /* Solution 1: Use dictionary table sashelp.vcolumn */ /* Find data sets in WORK that have AT LEAST ONE character variable */ use sashelp.vcolumn(where=(libname="WORK" & memtype='DATA' & type='char'); /* read only CHAR variables */ read all var {memname name}; /* memname=data set name; name=name of character variable */ close; /* loop over data sets. If a set contains at least one character variable, process it */ dsName = {'AllNum' 'AllChar' 'Mixed'}; /* names of potential data sets */ do i = 1 to ncol(dsName); idx = loc(memname = upcase(dsName[i])); /* is data set on the has-character-variable list? */ /* for demo, print whether data set has character variables */ msg = "The data set " + (dsName[i]) + " contains " + char(ncol(idx)) + " character variables."; print msg; if ncol(idx)>0 then do; /* the data set exists and has character vars */ charVars = name[idx]; /* get the names of the character vars */ use (dsName[i]); /* open the data set for reading */ read all var charVars into X; /* read character variables (always succeeds) */ close; /* process the data */ end; end; |
The output shows that you can use the DICTIONARY tables to determine which data sets have at least one character variable. You can then use the USE/READ statements in PROC IML to read the character variables and process the data however you wish.
As mentioned previously, this technique can also be used in PROC SQL and the DATA step.
Use SAS/IML tables to find character variables
The previous section is very efficient because only character variables are ever read into SAS/IML matrices. However, there might be situations when you want to process character variables (if they exist) and then later process numerical variables (if they exist). Although a SAS/IML matrix contains only one data type (either all numeric or all character), you can read mixed-type data into a SAS/IML table, which supports both numeric and character variables. You can then use the TableIsVarNumeric function to generate a binary indicator variable that tells you which variables in the data are numeric and which are character, as follows:
/* Solution 2: Read all data into a table. Use the TableIsVarNumeric function to determine which variables are numeric and which are character. */ dsName = {'AllNum' 'AllChar' 'Mixed'}; /* names of potential data sets */ do i = 1 to ncol(dsName); /* for each data set... */ T = TableCreateFromDataset("WORK", dsName[i]); /* read all variables into a table */ numerInd = TableIsVarNumeric(T); /* binary indicator vector for numeric vars */ charInd = ^numerInd; /* binary indicator vector for character vars */ numCharVars = sum(charInd); /* count of character variables in this data set */ msg = "The data set " + (dsName[i]) + " contains " + char(numCharVars) + " character variables."; print msg; if numCharVars > 0 then do; X = TableGetVarData(T, loc(charInd)); /* extract the character variables into X */ /* process the data */ end; /* optionally process the numeric data */ numNumerVars = sum(numerInd); /* count of numeric variables in this data set */ /* etc */ end; |
The output is identical to the output in the previous section.
Summary
In summary, this article discusses a programmer who wants to iterate over many SAS data sets and process only character variables. However, some of the data sets do not have any character variables! This article shows two methods for dealing with this situation: DICTIONARY tables (available through Sashelp views) or SAS/IML tables. The first method is also available in Base SAS.
Of course, you can also use this trick to read all numeric variables when some of the data sets might not have any numeric variable. I’ve previously written about how to read all numeric variables into a SAS/IML matrix by using the _ALL_ keyword. If the data set contains both numeric and character variables, then only the numeric variables are read.
References
The following resources provide more information about DICTIONARY tables in SAS:
- The SAS documentation contains a section about
the section about dictionary tables and views. - Eberhardt, P. and Brill, I. (2006), “An Introduction to SAS Dictionary Tables.”
- Laffler, K. (2005), “Exploring DICTIONARY Tables and Views.”
The post How to detect SAS data sets that contain (or do not contain) character variables appeared first on The DO Loop.
This post was kindly contributed by The DO Loop - go there to comment and to read the full post. |