Monday, 6 September 2010

NOTE: Marking Time with Enterprise Guide

In my recent review of a client's SAS/Base jobs' suitability for EG visual coding, I wasn't surprised to come across some jobs that created new tables each month where the table names included the date. The table names were of the form MASTER.SALES_2010_08, indicating sales for August 2010. However, the fact that the table was referenced with the use of a macro variable, i.e. MASTER.SALES_&month., meant that EG was less than happy.

This style of table naming doesn't sit well with EG's visual coding (nor DI Studio) because the process flow needs a static table name for inputs and outputs. You can use the Query Builder to create an output table whose names includes a macro variable. The code will execute and the table will be created, but it won't appear in the process flow. And you can't use a macro variable in the name of an input table for visual coding.

There are at least two approaches to take to this problem...
The first approach would be to change your data model and avoid using dynamic table names. In some cases this can be done by using table names such as "this_month" and "last_month", but clearly this isn't practical in all cases.

The second approach is to use a view onto the dynamically-named table. In this case, the flow consists of i) create the dynamically-named table, ii) in a program node, create a view in the work library that maps to the table, iii) reference the view as the input table in subsequent tasks. Below is some example code for creating the view:

proc sql;
  create view work.master_dot_sales_month
    as select * from master.sales_&month.;
quit;


Since the name of the view is static, EG is happy to allow you to use it as input to any downstream tasks.

The above technique is useful and simple. In my next post I'll offer some further information on the technique to help you be assured of good performance.