Monday, 17 September 2012

NOTE: What Am I Reading?

We all try to write data-driven code, or code that provides some degree of flexibility regarding its input data. We see these approaches as reducing maintenance and making the code more robust. There are a number of features in the SAS language that allow us to know more about the data that we're using. Have you come across the DATA step's CALL VNEXT routine? No? Then read on...

The CALL VNEXT routine returns the name, type and length of a variable in use in a DATA step. In this case, "in use" means "in the Program Data Vector (PDV)". The PDV is the area of memory that the DATA step sets aside to hold all variables that are read in or created during execution of the DATA step. Understanding the PDV, how it is constructed, and how it is populated is very important if you are to truly master DATA step programming. CALL VNEXT provides a good insight.

The following log output shows how successive calls to VNEXT reveal details about each successive variable in the PDV. See how VNEXT returns information about first. and last. variables if a BY statement is present, plus _ERROR_ and _N_.

18         proc sql;
19           create view classv as
20             select *
21             from sashelp.class
22             order by sex;
NOTE: SQL view WORK.CLASSV has been defined.
23         quit;
NOTE: PROCEDURE SQL used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds
     

24         data _null_;
25           set classv;
26           by sex;
27           length vname $32 vtype $1 vlength 8;
28           vname='xx';
29           do while(lengthn(vname) ne 0);
30             call vnext(vname,vtype,vlength);
31             put vname= vtype= vlength=;
32           end;
33           STOP;
34         run;

vname=Name vtype=C vlength=8
vname=Sex vtype=C vlength=1
vname=Age vtype=N vlength=8
vname=Height vtype=N vlength=8
vname=Weight vtype=N vlength=8
vname=FIRST.Sex vtype=N vlength=8
vname=LAST.Sex vtype=N vlength=8
vname=vname vtype=C vlength=32
vname=vtype vtype=C vlength=1
vname=vlength vtype=N vlength=8
vname=_ERROR_ vtype=N vlength=8
vname=_N_ vtype=N vlength=8
vname=  vtype=  vlength=0
NOTE: There were 19 observations read from the data set SASHELP.CLASS.
NOTE: There were 2 observations read from the data set WORK.CLASSV.
NOTE: DATA statement used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds

Ian Whitlock's SAS User Group International (SUGI) paper titled "How to Think Through the SAS DATA Step" for SUGI 31 provides a lengthy and detailed exposé of the PDV and other key elements of DATA step programming. Jim Johnson's "The Use and Abuse of the Program Data Vector" from this year's SAS Global Forum (SGF) focuses neatly on the PDV. If you want to know more about the PDV, these papers are excellent sources of information.

Making your programs robust (meaning they don't fail, for example, every time there's the slightest mistaken variation in input data) and making them future-proof (meaning they don't need to be amended and re-tested when, for example, field lengths get longer due to a corporate merger and the introduction of unforeseen data sources) won't keep you busy with maintenance work but will give you time to spend on more productive, innovative activities. CALL VNEXT might just help you in achieving this goal.