Tuesday, 29 November 2011

NOTE: Testing (Collating the Results)

We began this series of posts on testing with an introduction to testing disciplines, and I followed that with a description of how we could quickly create some small macros that would allow us to automate some of our testing and present the results in a consistent fashion in the SAS log. However, putting the results into a SAS data set rather than the log was the next step to improve our efforts.

We can use the tag as a unique key for the test results data set so that the data set has the most recent test result for each and every test. We can re-run individual tests and have an individual row in the results data set updated.

To give us the greatest flexibility to add more macros to our test suite, we don't want the process of writing results to a data set to interfere with activities that are external to the macro. So, using a SET statement, for example, would require the data set to be named in the DATA statement. This seems a good opportunity to use the OUTPUT method for a hash table. We can load the results data set into the hash table, use the TAG as the key for the table, and add/update a row with the result before outputting the hash table as the updated results data set. Here's the code:

%macro assert_condition(left,operator,right,tag=
                       ,resultdata=work.results);
  /* Load results into hash table */
  length Tag $32 Result $4;
  declare hash hrslt(dataset:"&resultdata");
  rc = hrslt.defineKey('TAG');
  rc = hrslt.defineData('TAG','RESULT');
  rc = hrslt.defineDone();
  /* Update the hash table */
  tag = "&tag";
  if &left &operator &right then
    result="PASS";
  else
    result="FAIL";
  rc=hrslt.replace(); /* Add/update */
  /* Write back the results data set */
  rc = hrslt.output(dataset:"&resultdata");
  rc = hrslt.delete();
%mend assert_condition;


By adding the maintenance of the results data set to our basic assert macro, the functionality gets inherited by any higher-level macro (such as yesterday's %assert_EqualRowCount).

Clearly, the new macro won't work if the results data set doesn't already exist, and we'd like to present the results in a format better than a plain data set. We'll cover that in the next post.

NOTE: Testing is Like Visiting the Dentist?

In a comment in response to my recent Testing - Discipline article, Rick@SAS provided a link to an article he'd posted on the American Statistical Association community forum wherein he drew an analogy between testing and going to the dentist. I think it's a very well drawn analogy. Thanks Rick.

Thursday, 24 November 2011

NOTE: Testing (With Basic Macros)

So, having stressed the importance of testing in my previous post on the subject, let me give you some hints on how I keep the test phase efficient and effective on my projects. This will be a series of hints across the next few days. Today I'll offer some tips for automating your tests, and I'll describe a simple macro that you can use to highlight test results in your log. As the series continues I'll describe how to enhance the macro and easily add a reporting system to summarise your test results in one place.

There is a degree of overlap between FUTS (mentioned yesterday) and the macros that I shall describe. I recommend FUTS, but some of my clients find its size intimidating, and others don't have authority to download anything (especially "code") from the internet.

Let's assume you have a set of tests scripts, i.e. steps for the tester to take, accompanied by expected results for each step. Let's take a simplified example:

1) Run the customer load. Inspect the SAS log. Expect no error or warning messages.

2) Count the rows in the input customer file (CSV) and the warehouse customer table (SAS data set). Expect the number of rows in each to match. [I said it's a simplified example!]

We can automate the second test (the first too, but that's for another time). The benefit of automating is that we can re-run it quickly and effectively (after test failure, and for regression testing).

Tuesday, 22 November 2011

Testing - Discipline

My dad says "if a job's worth doing, it's worth doing well". I say "if a bit of code is worth writing, it's worth testing it properly". Maybe I'm stretching the old saying a little, but the principle remains true.

Software testing is a very large subject area; I'm not going to try to reproduce a text book here. I'm simply going to list some of the principles I apply to the testing phases of my projects and then show some useful macros that I have developed to aid the re-use of tests. There are many different types of test phase, each with different objectives. Some of these were briefly covered in my "SAS Software Development With The V-Model" paper at this year's SAS Global Forum (SGF).
  • To test something, you need to know what it should do, in all circumstances. This means you need to have established an agreed set of requirements and/or specifications.
  • There are a number of reasons why you might need to re-run a test - because the test failed, or for regression testing. For this reason, and for others, automated tests are preferable to manual tests.
  • Look upon your tests as an investment. Firstly, finding bugs before go-live is always "a good thing" for a number of reasons. But secondly, tests invariably need to be re-run, so the more effort you put into them the more they'll repay you when you have to re-run them. A library of re-usable tests is an asset.
  • Don't just test the "happy path" for your system. Test that the system rejects bad input and handles unexpected situations elegantly. This is called "Negative Testing". In simple terms this might mean testing with values of zero, one, two, negative, non-integer, and very large numbers
  • Document your test strategy. This includes stating which testing method & tools will be used for each different type of system element, e.g. data entry screens, report-generation wizards, small files, big files, important reports (to be sent to regulatory authorities, for example), less important reports (for internal information only, for example)
  • Document your test plan and test cases, i.e. the individual steps (and expected results) that the tester should follow.
  • Documenting your test steps means that they can reliably be re-run if the tests have to be done again
  • With regard to documentation, I always preach the "barely adequate" approach, i.e. do what needs to be done ("adequate") but don't go beyond ("barely"). In order to do this, you need to clearly understand the objectives of each document and the intended audience(s). Sometimes you need separate documents; sometimes you can put all of the content into one document.
So, having stressed the importance of testing, let me give you some hints on how I keep the test phase efficient and effective on my projects. Actually, I'm going to offer a series of hints over the next few days. In the first I'll offer some tips for automating your tests, and I'll describe a simple macro that you can use to highlight test results in your log.

I'll finish today by recommending a suite of SAS macros named FUTS (Framework for Unit Testing SAS programs) from Thotwave. These are available for free download after registering with the site (the download includes documentation and some examples of usage too). Developed by Greg Barnes-Nelson and colleagues, the macros are pure gold.

You can read background to the macros in the following SAS conference papers which chart the development and use of the macro (from their original incarnation as SASUnit through to FUTS):

Automated Testing and Real-time Event Management: An Enterprise Notification System, SUGI 29, 2004

SASUnit: Automated Testing for SAS, Phuse, 2004

Drawkcab Gnimmargorp: Test-Driven Development with FUTS, SUGI 31, 2006

NOTE: SAS Global Forum 2012 - One door closes, another opens

The closing date for submission of papers for SAS Global Forum (SGF) 2012 has passed, but registration is now open. If you can get yourself to Florida, USA 22-25 April 2012, you will benefit from a vast array of papers (from SAS staff and from SAS customers), plus myriad opportunities to talk to and learn from fellow SAS practitioners.

Register before March 19th 2012 to get the best deals. Go with 3 or more colleagues and get a greater discount.

SGF is an annual event organised by the SAS Global Users Group, a non-profit organisation that is open to all SAS software users throughout the world. SAS Global Users Group is governed by an Executive Board whose membership is composed of individuals who have been selected to chair the annual conference plus three SAS Institute representatives. The organisation was formed to provide a means for SAS software users to exchange ideas, explore ways of using SAS software, and participate in activities of mutual interest. Since its first event in 1975 (known at that time as SAS Users Group International - SUGI), SGF has been the premier event of the year for SAS practitioners worldwide, offering educational and networking opportunities. This year's conference chair is in the safe hands of Andy Kuligowski and will doubtless be the best ever.

Wednesday, 2 November 2011

NOTE: Every Day's a Learning Day!

Pope Gregory XIII
The purpose of me attending SAS courses is to teach, and I'd like to think I'm successful at doing that, but not a course goes by without me learning a little tip or trick from one of the learners. Today's was most interesting...

We all know that SAS dates are stored as numbers, and the number represents the number of days since 1st January 1960 (or, number of days since my birthday plus 890 as I prefer to think of it!). Hence, 1st January 1960 is zero, and 2nd January 1960 is one.

A learner asked "what about dates before 1960, does SAS use negative numbers", to which I replied in the affirmative, but I then tried to enrich the answer by saying that you can only go back to 1548 because King Henry VIII changed the calendar at that time. Another learner politely pointed out that King Henry died in 1547, so I couldn't possibly be correct (my learners are well educated!).

Some quick research in the next break revealed some interesting information in the SAS 9.3 Language Reference: Concepts manual in the About SAS Date, Time, and Datetime Values section. Two things caught my eye:

1) SAS can perform calculations on dates ranging from A.D. 1582 to A.D. 19,900

2) SAS date values can reliably tell you what day of the week a particular day fell on as far back as September 1752, when the calendar was adjusted by dropping several days

So, (1) confirms that I got the year wrong, and it's long after King Henry's death, so I was proved utterly wrong! Further research turned-up an explanation of events in 1582. SAS Knowledge Base article 24808 by William Kreuter describes Calculating Age with Only One Line of Code but also mentions "Pope Gregory XIII proclaimed the Gregorian calendar in 1582" to properly deal with leap years. So there's the full explanation. Where I got my belief that it was 1548 and King Henry VIII I do not recall.

Item (2) was interesting for me too because I had not realised that SAS's day of the week capabilities only worked for a subset of the SAS date range.

So, every day's a learning day - teachers included!

Tuesday, 1 November 2011

Avoid Magic Numbers

Mention of my university days last week reminded me of an old maxim that my professor drummed into me. Avoid magic numbers.

Have you ever had to amend a colleague's code and found a number in the middle of a calculation and wondered what the significance of the number was? Did it make you nervous about changing the code because you didn't fully understand it? Your colleague coded a "magic number".

The term "magic numbers" refers to the practice of including numbers in source code. Doing so is likely to leave very little explanation as to why the developer chose that number, and thus the program becomes difficult to confidently maintain. It is far preferable to declare numeric constants at the top of your program as macro variables. This has a number of advantages:

1) Any mis-typed reference to the numeric constant will be highlighted by the SAS macro processor or compiler as an uninitialised variable, whereas a mis-typed number can be very difficult to spot

2) The numeric constant can be defined once and then used many times throughout the program. Thus, if it needs to change, the change needs to be made just once, and there's no danger of missing one or more occurrences that also need to be changed

3) Placing the definition at the top of the code makes it very easy to add some comments around it that describe the rationale for the value, plus advice for making changes.

As you can see, there are distinct and compelling reasons to avoid magic numbers in your code. To illustrate the point, consider the following snippet of code:

data ada;
  set sashelp.buy;
  GrossPrice = amount * 1.2;
run;

The code offers no indication/explanation that it is adding Value Added Tax (VAT) to the price of each item. We could re-code things this way:

%let VATrate = 0.20; /* VAT is currently 20% - 20OCT2011 */

data ada;
  set sashelp.buy;
  GrossPrice = amount * (1+&VATrate);
run;

We have documented the meaning of the number, and used a descriptive name throughout our program. And, if we mis-type the name of the macro variable in our DATA step we'll be rewarded with an error message, whereas mis-typing 1.2 in our earlier code may have gone unnoticed.

Inevitably there are drawbacks to this approach (such as making the code more complex in some people's eyes, and slowing the compilation process by a fraction) but the benefits, in my opinion, far outweigh the drawbacks.

There are cases where it is acceptable to use numeric constants. The true and false values (1 and 0) are two such examples.

Making your code data-driven and thereby eliminating the use of numeric constants is better still. But that's a topic for another day!

NOTE: Macro Options

Last week I wrote about the availability of the IN operator in macro through the use of the MINOPERATOR system option. I mentioned that SAS v9.0/9.1 had introduced a number of useful features to the SAS macro capability. Here are some options that you might find useful:

MPRINTNEST - Augments the information written to the log by MPRINT. If the MPRINT option has also been set, MPRINTNEST will show the level of nesting of macro calls, and the names of the nested macros. Very useful. V9.3 introduced a couple of new macro functions in this area: %SYSMEXECDEPTH returns the depth of nesting from the point of call, and %SYSMEXECNAME returns the name of the macro executing at a nesting level.

MCOMPILENOTE - Issues a message ot the log after a macro has been compiled. Successful completion of macro compilation is normally silent, so this forms a handy, positive confirmation.

MAUTOLOCDISPLAY - Echoes the source's location to the log each time an autocall macro is used (rather than once at compile-time). This provides a neat confirmation that SAS has picked-up the macro source from the directory you intended. In v9.3, MAUTOCOMPLOC specifies the location of an autocall macro when compiled by writing a message to the log.

NOTE: Business Intelligence Notes

Tricia Aanderud (@TAanderudcommented on my post about the late Dennis Ritchie. Her observation that she frequently finds herself creating elaborate SAS programs to avoid using the dreaded command line rang true with me. Big time!

Tricia and Angela Hall (@Angela, Real BI for Real Users blog) are working on a new SAS publication entitled Building Business Intelligence Using SAS. Judging by their recent tweets it'll be published soon. The advance information I've seen makes the book look a real winner.

And another winner is their joint web site. It'd somehow passed me by until Tricia subtly introduced it in her comment. In addition to being beautifully presented, http://www.bi-notes.com/ is full of useful BI tips and tricks - tasters of what is to come in their book. I've already added it to my Google Reader subscriptions, and I've added it to NOTE:'s blogroll too. Recommended.

From a quick overview of SAS EBI to squeezing more visible columns into your Web Report Studio output, there's plenty of interest.