Wednesday, 30 September 2009

NOTE: Data Integration in V9.2

Many people are not convinced by SAS Data Integration Studio (DI Studio). To the seasoned SAS programmer, DI Studio is a glossy interface that cannot produce the range of functionality that their hand-written code can achieve, and the code that it does produce is less efficient in many cases. But look at the other side of the coin, to create a job with DI Studio requires knowledge of the data, but little knowledge of SAS syntax, so the skills required are more readily available. And with each new version of SAS comes more  transforms and other job nodes, i.e. steps in the job.

So, DI Studio may not yet be a panacea that allows data modellers to build extract, transform and load (ETL) code without SAS programming skills, but it's moving closer. SAS's new approach to releasing products independent of each other perhaps means that DI Studio can be be evolved and released to customers more quickly.

Think of this: if you go far enough back in history you will come across a time when machine code programmers scoffed at the possibility of producing compilers for 3rd generation languages like Fortran and Cobol. They argued that the compiler could never produce code that was optimised as well as their hand-crafted machine code. But the increase in machine speed combined with better compilers meant that the inefficiencies in the compiler's code were reduced, and the impact of those remaining inefficiencies was decreased by the faster machines. Sounds familiar? How long will it be before the majority of SAS jobs are produced with DI Studio, as SAS produce new and better transforms, and machines get faster?

I was impressed by the V9.2 release of DI Studio that I saw at SAS Global Forum (SGF) earlier this year. Apart from basic interface enhancements and a number of new transforms, I noted:

  • Ability to prevent DI Studio from "tidying" your layout. Thus you can position transforms and tables in places that make best sense to you
  • You can put the same table in more than one place on your layout, e.g. where it's used as input and output in different points
  • The addition of textual notes that can be placed on the layout in order to provide a form of documentation (or temporary development notes)
  • A usable undo capability!
  • Performance monitoring that shows real-time statistics while your job is running

Visual coding, such as is offered by SAS DI Studio, is the future. When will you get on board?

What are your thoughts? Post a comment!