Thursday, 26 December 2013

Your "2013" Top Ten

It's nearing the end of the year so I thought I'd publish a list of the ten most viewed articles published this year. Like last year, I realised that this would be skewed in favour of older articles. So, in an effort to bring balance, I decided to look at the posts I'd published in the last 18 months and highlight the ten most viewed. And when I come to do that in a years time, the last half of 2013 will get its fair crack of the whip. Does that make sense?

Anyway, here they are (with the most popular at the top of the list)...

1.NOTE: Booked and Ready for SAS Global Forum 2013 (06-Feb-2013)
2.NOTE: DS2, Learn Something New! (11-Feb-2013)
3.NOTE: DS2, Threaded Processing (18-Feb-2013)
4.NOTE: DS2, SQL Within a SET Statement (13-Feb-2013)
5.NOTE: Best Contributed Paper Awards from SAS Global Forum 2013 (07-Jun-2013)
6.NOTE: DS2, Final Comments (20-Feb-2013)
7.NOTE: DS2 Final, Final Comments (27-Feb-2013)
8.NOTE: Executing a PROC from a DATA Step, Revisited (14-Jan-2013)
9.NOTE: SAS 9.4 is Functioning Well (15-Jul-2013)
10.Affinity Diagrams for Problem Solving (15-May-2013)

It's interesting to see so many of my DS2 articles in the top ten. It shows a keen interest in the topic. Aside from that, I was glad to see one of my articles taken from my SAS Global Forum paper make the top ten (Affinity Diagrams).

Are any of your favourites in the list?

I hope you had a good Christmas. Here's to a healthy and happy New Year.

Thursday, 19 December 2013

NOTE: Have You Won Yet?

Are you checking the SAS advent calendar daily, and entering the prize draw? Have you won anything yet? I have not, but I did win a very informative book a couple of years ago.

Opening the windows in the advent calendar is an annual treat. Last year SAS gave us an additional treat - a game for Apple iOS and Google Android devices. I think it's time I dusted it off and got back into practice before challenging the kids over Christmas!

Tuesday, 17 December 2013

Regression Tests, Holding Their Value

Last week I wrote about how our test cases should be considered an asset and added to an ever growing library of regression tests. I had a few correspondents ask how this could be the case when their test cases would only work with specific data; the specific data, they said, might not be available in the future because their system only held (say) six months of data.

It's a fair challenge. My answer is: design your test cases to be more robust. So, for instance, instead of choosing comparison data from a specific date (which might eventually get archived out of your data store), specify a relative date, e.g. instruct your tester to refer to data from the date preceding the execution of the test. Test cases have steps, with expected results for each step. Begin your test case by writing steps that instruct the tester to refer to the source/comparison data and to write down the values observed. In your subsequent steps, you can instruct your tester to refer to these values as expected results.

Other correspondents said that their tests were manual, i.e. using a user interface and clicking buttons, and hence couldn't be re-run because they were too time-consuming. In this case, I draw attention to my observations about a) deciding what degree of confidence the test exercise should engender in the modified system, and b) deciding what tests need be (re)run in order to provide the confidence. It's fine to choose not to re-run some of your regression tests, but be aware that you're making a decision that impacts the degree of confidence delivered by your tests.If sufficient confidence is delivered without rerunning the manual steps then all is good; if not, you need to revisit your decisions and get them back into balance. There's often no easy answer to this balancing act, but being open and honest about time/effort/cost versus confidence is important.

The longer term answer is to try to increase the number of automated tests and reduce those needing manual activity. But that's a topic for another day!

Wednesday, 11 December 2013

Test Cases, an Investment

It never ceases to frustrate and disappoint me when I hear people talking of test cases as use-once, throwaway artefacts. Any team worth its salt will be building a library of tests and will see that library as an asset and something worth investing in.

Any system change needs to be tested from two perspectives:
  1. Has our changed functionality taken effect? (incremental testing)
  2. Have we broken any existing functionality? (regression testing)
The former tends to be the main focus, the latter is often overlooked (it is assumed that nothing got broke). Worse still, since today's change will be different to tomorrow's (or next week's), there's a tendency to throw away today's incremental test cases. Yet, today's incremental test cases are tomorrow's regression test cases.

At one extreme, such as when building software for passenger jet aircraft, we might adopt the following strategy:
  • When introducing a system, write and execute test cases for all testable elements
  • When we introduce a new function, we should write test cases for the new function, we should run those new test cases to make sure the new function works, and we should re-run all the previous test cases to make sure we didn't break anything (they should all work perfectly because nothing else changed, right?)
  • When we update existing functionality, we should update the existing test cases for the updated function, we should run those updated test cases to make sure the updated function works, and we should re-run all the previous test cases to make sure we didn't break anything (again, they should all work perfectly because nothing else changed)
Now, if we're not building software for passenger jets, we need to take a more pragmatic, risk-based approach. Testing is not about creating guarantees, it's about establishing sufficient confidence in our software product. We only need to do sufficient amounts of testing to establish the desired degree of confidence. So there are two relatively subjective decisions to be made:
  1. How much confidence do we need?
  2. How many tests (and what type) do we need to establish the desired degree of confidence?
Wherever we draw the line of "sufficient confidence", our second decision ought to conclude that we need to run a mixture of incremental tests and regression tests. And, rather than writing fresh regression tests every time, we should be calling upon our library of past incremental tests and re-running them. And the bottom line here is that today's incremental tests are tomorrow's regression tests - they should work (unedited and without modification) because no other part of the system has changed.

Every one of our test cases is an investment, not an ephemeral object. If we're investing in test cases and managing our technical debt, then we are on the way to having a responsibly managed development team!

'Tis the Season of Lists

December is the time when the media industry tends to offer more lists than the other 11 months of the year. One that caught my eye was the TIME magazine Top 10 gadgets of 2013. These kind of lists are always good for spurring conversation down the pub. For instance, I was surprised that Google Glass didn't get a mention, but I guess it's not yet a consumer product. On the other hand, I wasn't surprised at all to see a smart watch in the list. Smart watches didn't quite hit the mainstream in 2013, but I'm sure we'll see a lot more (cheaper and more functional) in 2014.

It was also interesting to look back at TIME's all-time 100 gadgets from 2010. Whilst all the items in the list were leading-edge at the time they were introduced, it's quite startling to see the pace of technological change, i.e. how dated the items in the list are (even the most recent ones from 2009/10) by 2013 standards.

The pace of change in our own industry is barely any slower. Traditional skills such as SAS/BASE and SQL programming are still in demand, but broader and deeper knowledge is increasingly in demand, whether it be for other languages like SAS DS2, Pig or Hive, or whether it be visual coding in Enterprise Guide or Data Integration Studio, or whether it be SAS architecture (significantly changed again in V9.4), or whether it be data analytics. Whatever your role, be sure to be thinking how to keep up with developments and how to keep your skills up-to-date.

Tuesday, 10 December 2013

NOTE: Reverse-Engineering Technical Debt

I wrote a couple of items about technical debt back in November (here and here). Sometimes you don't choose to create debt for yourself, sometimes it's inherited. In technical guises, debt can be inherited when teams merge, for instance.

In such circumstances, it can be difficult to know how much debt has been inherited. In these cases, reverse engineering tools can be of use. I'm thinking in particular of Complementsoft's ASAP.

ASAP takes SAS/BASE source code and produces process flow and data flow diagrams. In other words, it works in the reverse direction to tools such as Enterprise Guide and DI Studio - these tools allow you to draw flows and then the tools generate SAS/BASE code from your diagrams. ASAP takes your code and produces diagrams.

In fact, ASAP can read your program source code or your SAS log. Reading the log is especially useful when you're using macros with conditional logic that will generate different SAS/BASE code dependant upon input data.

In addition to creating diagrams from your code and logs, ASAP has an in-built editor and remote code-submission capabilities, so it can form a complete code development and execution environment. And, it allows you to quickly skip between nodes in diagrams and the associated DATA step or procedure in your source code or log.

There aren't many SAS-related third-party products available to SAS customers. ASAP is one of the few and I'm pleased to be able to give it a mention in the NOTE: blog. If you'd like to see more, take a look at the "demo" on the ComplementSoft web site, and take advantage of their free trial.

Wednesday, 4 December 2013

NOTE: Enterprise Guide vs DI Studio - What's the difference?

A favourite interview question of mine is: Compare and contrast SAS 9's stored process server and workspace server. This question is very good at revealing whether candidates actually understand some of what's going on behind the scenes of SAS 9. I mentioned this back in 2010, together with some notes on my expectations for an answer.

I was amused to see Michelle Homes post another of my favourite interview questions on the BI Notes blog recently: What’s the difference between SAS Enterprise Guide and SAS DI Studio? This question, and the ensuing conversation, establishes whether the candidate has used either or both of the tools, and it reveals how much the candidate is thinking about their environment and the tools within.

For me, there are two key differences: metadata, and primary use.

Michelle focuses on the former and gives a very good run-down of the use of metadata in Data Intergration Studio (and the little use in Enteprise Guide).

With regards to primary use, take a look at the visual nodes available in the two tools. The nodes in DI Studio are focused upon data extraction, transformation and loading (as you would expect), whilst the nodes in Enterprise Guide (EG) are focused upon analysing data. Sure, EG has nodes for sorting, transposing and other data-related activities (including SQL queries), but the data manipulation nodes are not as extensive as DI Studio. In addition to sorting and transposing, DI Studio offers nodes that understand data models, e.g. an SCD loader and a surrogate key generator (I described slowly changing dimensions (SCDs) and other elements of star schema data models in a post in 2009). On the other hand, EG has lots of nodes for tabulating, graphing, charting, analysing, and modelling your data.

One final distinction I'd draw is that EG's nodes are each based around one SAS procedure, whilst DI's nodes are based around an ETL technique or requirement. You can see that DI Studio was produced for a specific purpose, whilst EG was produced as a user friendly layer to put on top of the SAS language and thereby offers a more generalistic solution.

For the most part, I'm stating the obvious above, but the interview candidate's answer to the question provides a great deal of insight into their approach to their work, their sense of curiosity and awareness, and their technical insight.

Tuesday, 3 December 2013

NOTE: Tips to Avoid the Bus

Back in 2011 I wrote about the Bus Factor, i.e. the minimum number of people on your project (or in your support team) whose loss would cause serious issues for your project/support team. The name of this factor derives from the possibility of one or more team members getting hit by a bus. An alternative (less tragic) name - highlighted by Angela Hall at the time - is "lottery factor", i.e. we assume that one or more people got a big win on the lottery and immediately left work, never to return. Either way, it's a serious factor and must be managed.

At the time, I offered a number of techniques to help increase your team's bus factor (a good thing). Here are a few more that I use, all focused on the greater sharing of knowledge. If you ingrain the techniques of active and deliberate knowledge sharing into your team members then you need worry less about your bus factor, but don't completely take your eye off the ball - remember to manage it.

Push-Based Knowledge Sharing. The person who holds the knowledge about something asks a person who does not know about it to join them to learn about it. They thereby PUSH the information towards the other person.

Pull-Based Knowledge Sharing. The person who does not have knowledge about something asks another person who knows about it to teach them about it in some way. In this way, they establish a PULL of the information from the other person.

Knowledge-Share Handshaking. Having only a single direction knowledge sharing culture, i.e. only pull or only push, is not the most effective culture. There has to be a knowledge handshake for knowledge to freely flow through. Encompassed within handshaking is the idea of pairing. One of the best ways to remove bus factors, is by pairing. Pairing is an act of implicit learning where knowledge is constantly back and forth. On the other hand, if a person asks a question “How did you do that?” then that is an act of explicit learning.

Pairing is hard to achieve in organisations where pairing was never a “thing” people do. If you cannot get enough people to pair, or the bus factor is happening when a person from a different team knows something that your team replies on, it’s time to start encouraging implicit knowledge gathering, or implicit learning.

NOTE: Advent Calendar 2013

I bring good news! The SAS Professionals advent calendar is now working nicely. Open a new window each day to stand a chance of winning great prizes.

Sunday, 1 December 2013

NOTE: Whither the Advent Calendar?

It's traditional for me to mention the SAS Professionals advent calendar at this time of year. However, this year it seems to have stalled. Clicking the #1 today tells me that I need to wait for the correct date.

I'll post an update as soon as I have more information.

On the plus side, I'm pleased to see that Smooth Radio Christmas has recommenced broadcasting (on the internet, but not DAB this year). With the demise of Smooth Radio 70s in October I was afraid that the Christmas station might meet the same fate. Fortunately not, but I'm sad I can't listen to it in the car on DAB - I've found no replacement either.

If the music of Smooth is not to your taste, consult a list of Christmas radio stations. There's a good range. My kids favourite is North Pole Radio.