Monday, 7 January 2013

NOTE: The OPEN Function (getting data set information in macros)

There are a number of ways of accessing the content of SAS data sets. DATA steps and PROCs are two obvious means. Another method is to use the DATA step's OPEN function.

Let's start with some basics. Here's a DATA step that is able to establish whether a specific data set has an index...

data _null_;
  dsid = open('SASHELP.CLASS');     /* Open the data set */
  HasIndex = attrn(dsid,'ISINDEX'); /* Get num attr */
  rc = close(dsid);                 /* Close the data set */
  put _all_;
run;


The following is returned in the log:

dsid=1 HasIndex=0 rc=0 _ERROR_=0 _N_=1

In essence, the OPEN function provides a numeric identifier for the data set in question; the identifier can then be passed to a myriad of functions (such at ATTRN) to "do stuff" with the data set; the identifier should then be returned back to SAS with the CLOSE function when you've finished with the data set (in order to close it and free-up memory). The OPEN function (and its associated functions) can be used in any DATA step, regardless of whether one or more other data sets are already open with a SET statement, for example.

The functionality provided by ATTRN is simple, but with its cousin ATTRC it provides a large number of different attributes. There are many more associated functions that use the data set identifier returned by the OPEN function. Some examples are: VARNUM, GETVARC/N, FETCH, FETCHOBS, and CALL SET. I'll talk about these in subsequent posts.

We're not restricted to using these DATA step functions within a DATA step. We can use them within macro code, or a PROC FCMP function. Their use within macro code can be particularly valuable. For instance, to get the value of a column from a specific row in a table and place it into a macro variable, we could use a DATA step thus:

%let dsid = %sysfunc(open(SASHELP.CLASS)); /* Open the ds */
%let HasIndex = %sysfunc(attrn(&dsid,ISINDEX)); /* Get attr */
%let rc = %sysfunc(close(&dsid)); /* Close the data set */
%put HasIndex is set to: &HasIndex;

And the resulting log shows:

HasIndex is set to: 0

It possibly looks more complicated than the DATA step alternative, but it has advantages too. We didn't need to run PROC CONTENTS, get the results into a data set, and then read the data set with a DATA step. And we didn't need to read SASHELP.VMEMBER, which can be resource-intensive.

Note that we've used macro statements, but they didn't need to be within a macro definition.

I'll show some more advanced uses in subsequent posts.

DATA SET FUNCTIONS:

1. NOTE: The OPEN Function (getting data set information in macros)
2. NOTE: The OPEN Function (reading data sets in macros)
3. NOTE: The OPEN Function (error trapping)