Thursday, 31 January 2013

NOTE: Ampersands Again

My recent post on double and triple ampersands in macros ended with some doubt. Laurie and Dave immediately took up the challenge to resolve the doubt. You can see the whole painful saga in the comments of the original post, but I've repeated Dave's summary of the conclusions here and I offer my thanks to Laurie and Dave for their tenacious work!
The macro parser performs two basic processes on each pass. If it finds a double ampersand , it converts it into a single one and then proceeds with the rest of the statement; and if it finds a string &xyz that it recognises as a macro variable, it resolves it and then proceeds with the rest of the statement. At the end of the statement, it checks to see if it has been left with any ampersands in the parsed output. If there were, it runs through the process again.

That's exactly how we understood it to work.

The detail we had missed is in what happens if it finds a string &abc that it DOESN'T recognise as a macro variable name. If on the same pass, in the head of the same text string, it also performs a successful double ampersand resolution, it does not regard the unrecognised variable as an error, but allows the string &abc to go through unchanged to the next iteration. But if there were no double ampersand resolutions, it knows that it has had its last chance at a successful resolution and so it will throw a warning - I think it's not an error but I can't check as I don't have SAS installed here.

NOTE: SEUGI Proceedings

Did I miss this? The proceedings of the SAS European User Group International (SEUGI, the European equivalent of SUGI and SGF) are now all online on the SAS Community web site. Papers from 1983 to the last event in 2007 are there.

Most of the pages were created in January 2013, so I think this is relatively fresh news.

A fabulous opportunity for European SAS practitioners of a certain age (like me!) to reminisce.

Monday, 28 January 2013

NOTE: Getting More for Less With SPDE

Would you like to read or write your data more quickly? SAS V9.1 introduced Scalable Performance Data Engine (SPDE, "Speedy Engine"), for free. Are you using it? Speedy Engine is the little cousin of Speedy Server (Scalable Performance Data Server, a chargeable product), but its potential benefits are not so little. With a simple set of changes to one LIBNAME statement, your code could run a lot faster.

Put simply, when you read data from a traditional SAS data set, the bits and bytes are read one-at-a-time from the disk. The same is true when you write data. However, if you use the Speedy Engine on your LIBNAME statement then your data will be written across (or read from) multiple disk drives, allowing SAS to send more data at the same time (because it can send the first 10 observations of your table to disk #1 and then send the second ten observations to disk #2 whilst waiting for disk #1 to finish writing the data). The parallel writing and reading of data means that the processor needn't spend most of its time waiting for disk operations to be completed.

Using Speedy Engine is simple, so long as you have multiple disks drives *and* your processor has multiple channels with which it communicate with the disk drives. Without multiple channels, the channel will become a bottleneck and you won't see the improvements you'd hoped for.

Here's a simple example LIBNAME statement:

libname sales spde "f:\alphasales\metadata" 
                   datapath=("g:\alphasales\datapart1" 
                             "h:\alphasales\datapart1" )
                   indexpath=("j:\alphasales\index" )
                   ;

Note how the LIBNAME statement largely follows the standard syntax, but we specify the SPDE engine after the "sales" libref, and we specify datapaths and an indexpath. What you need to know is that the Speedy Engine writes three different kinds of data: metadata, the index, and data. It simple needs to know where to put each kind of data. The first location specified in the LIBNAME statement (the F: drive in our example) is where the metadata will be stored. You don't need to worry about what the metadata contains, just be aware that it needs to go somewhere! For quick searching and retrieval, Speedy Engine will want to store index files somewhere, hence the indexpath. Finally, Speedy Engine will store your data in the multiple locations specified in the datapath. I've used two datapath directories in the example, but you can specify more.

If you look in these locations, after writing some data to the sales libref, you'll see files with unfamiliar suffixes to their names, i,e. not sas7bdat. This need not worry us, but it does emphasise a key point: you have to write data in Speedy Engine format before you can read it with Speedy Engine.

To get the most from Speedy Engine, make sure you have multiple processors, multiple channels, and multiple disk drives. If you don't know the detailed architecture of your SAS environment, and are not able to discover more information from your local support team, you can still use an empirical approach by trying different combinations of disk drives and see which combinations give best results. However, Speedy Engine offers features over-and-above performance improvements. For example, it offers implicit sorting (whereby you needn't pre-sort your input data when you use a BY statement), WHERE clause optimisation with OR operators, and more than 32k columns in a table(!).

Speedy Server (the big cousin) offers further advantages such as an Access Control List–based security model, support for parallel BY-group processing, and processing of implicit pass-through.

For more information on the Speedy Engine, take a look at the SPDE section of the SAS Support Focus Areas, including the Quick Set-Up Guide and Tips and Tricks. The NESG 2006 paper An Annotated Guide: The New 9.1, Free & Fast SPDE Data Engine by Russ Lavery and Ian Whitlock is a little dated, but it is useful and reliable nonetheless.

NOTE: Quick SAS Keyword Search in Chrome

Did you know that the Google Chrome browser allows you to specify multiple search engines, accessible by a short keyword? Well, I didn't until last week when my mention of Visuwords (graphical dictionary & thesaurus) made me wonder if there was some means of clicking a word on a page and taking it straight to Visuwords to look it up. Well, it seems there is, and I found a good use of the facility for SAS lookups too.

In the SAS context, I often find that searching for SAS words or phrases in a search engine just gives me too much material, so I will more often do my search at SAS's Global Forum Proceedings web page. It's not too tiresome to go to SAS's site (if I remember the URL), wait for it load, and then search, but with a search engine keyword I can now do it quicker by just searching for "sgf spde tips" if I'm looking for SPDE tips (SGF is my search engine keyword).

It's a pretty simple thing to do, but remember that this tip is for Chrome. As far as I can see, Internet Explorer is less flexible and accommodating.

Firstly, you need to know the right search URL to get a "sgf" result. This is most easily explained by demonstrating it for Visuwords first (all will become apparent in due course). A Visuword search URL is simply: http://www.visuwords.com/<word>.

So, we launch Chrome, go to the Customise and Control button (three horizontal lines, top right of the Chrome window), and select Settings. On the settings page you'll see an area headlined Search, and within that area you'll see a button labelled Manage Search Engines. Click it and you'll see a dialog window like that shown alongside this paragraph. Scroll to the bottom of the Other Search Engines and you'll see a blank set of options. Enter the following values.

  • Add a new search engine. Enter a meaningful descriptive name, e.g. Visuwords
  • Keyword. Enter a neat, short word to trigger your search, e.g. vw
  • URL. Enter the search URL, with %s in place of a search word, i.e. http://www.visuwords.com/%s
Once you've clicked OK, you're all set.

So, let's look-up "note". We begin by typing "vw" in the Chrome address bar (known as the omnibox); all looks normal until we press space; now we see a special prompt that says "Search Visuwords:"; finally we continue typing "note", press Enter, and we get taken to the Visuwords web site and we get shown the definition of "note". Hey presto! 

Okay, we can do the same for the SGF proceedings, but what is the search string? Well, go to the SAS Global Forum Proceedings web page and enter a search. See the URL that shows the results? It's long, isn't it?! But somewhere in there you'll find your search string. Copy the whole URL. Paste that into the Google settings dialog window, just replace your search string with %s, and hey presto again!

I've done the same for Lex Jansen's site too because his site searches across a number of different conferences, not just SAS Global Forum (and SUGI). So, I can search for "lex spde tips" and find SPDE-related conference papers from SAS Global Forum, SUGI, PharmaSUG, NESUG, SESUG, PhUSE, WUSS, MWSUG, PNWSUG and SCSUG.

You can do this with Internet Explorer too, but I've not found a simple route to add something that's not in IE's gallery of search engines. The easiest seems to be Enhance IE.

Can you think of any other SAS-related sites for which you'd appreciate a quick search mechanism? Let me know in a comment.

Sunday, 27 January 2013

NOTE: The Dreaded Double and Triple Ampersands

Aside from Chris Hemedinger's comment on SAS V9.3's PROC SQL INTO enhancements, my Macro Arrays, Straight From a Data Set article last week also caught the attention of another friend and regular correspondent - LeRoy Bessler. LeRoy emailed me to suggest that an explanation of why I used a triple ampersand in my macro code (see below) might be of value to many (especially since he is used to successfully using double ampersands in this situation).

%do i=1 %to &name0;
  %put I=&i, NAME&i=&&&name&i, AGE&i=&&&age&i;
%end;


LeRoy asked for an explanation. So here it is...

We're familiar with the single ampersand indicating that the name of a macro variable follows the ampersand, e.g. &guide instructs the SAS macro processor to substitute &guide with the value of the macro variable named guide. Thus, the value of guide gets passed to the SAS compiler.

The process of tokenisation and the processing that is done by the SAS macro processor includes a rule whereby two consecutive ampersands (a double ampersand) get resolved to a single ampersand. Not only that but the resulting single ampersand will be put to one side whilst the remainder of the token is parsed in the same manner. Once the first pass of the token is complete, it is all parsed again with the same rules. Here are some examples that illustrate what happens (I've used colour to highlight the separate clusters of characters that get processed):

%let guide = enterprise;
%let enterprise = starship;


Submitted by User After Pass #1 After Pass #2 Sent to SAS Compiler
guide n/a n/a guide
&guide enterprise n/a enterprise
&&guide &guide enterprise enterprise
&&&guide &enterprise starship starship

Let's use this information to decypher my original piece of SAS macro code:

%do i=1 %to &name0;
  %put I=&i, NAME&i=&&&name&i, AGE&i=&&&age&i;
%end;


In this case, we're looping around all values from 1 to whatever number &name0 resolves to. I want the code to produce the following output in the log:

I=1, NAME1=Alfred, AGE1=14
I=2, NAME2=Alice, AGE2=13
I=3, NAME3=Barbara, AGE3=13
[snip]


In the put statement, we print "I=" following by the value of &i, i.e. the index variable of the loop; then we print ", NAME", followed by the value of &I, followed by an equals sign. So far we've got "I=1, NAME1=". This is followed by &&&name&i. The first pass of &&&name&i resolves to a single ampersand followed by.... oh dear, there's a bug in my program!

a) There's no value for &name; my code had not previously created a macro variable called name

b) Moreover, I had not intended to use a macro variable named &name, I had wanted my code to merely resolve to name at this point

I should have used a double ampersand like LeRoy is used to. If I had specified &&name&i then the first pass would have produced &name2; and the next pass would have resolved to a student name (the array of macro variables had been loaded from sashelp.class).

So, I offer my thanks to LeRoy for politely highlighting my error, and inspiring this blog post.

But, one question remains: we understand how triple ampersands are the correct technique to be used in the first examples, but why did my code from the previous blog post work when the rules suggest it shouldn't? And my answer is: I don't know! Try it for yourself; SAS silently does what I wish, despite my code having too many ampersands.

Answers on a postcard please...

Saturday, 26 January 2013

NOTE: SQL INTO, Revisited

In Macro Arrays Straight From a Data Set last week I wrote about using SQL's INTO in order to create macro variables (and arrays of macro variables) from PROC SQL. I said that we had to use an arbitrarily large number to define the maximum size of the macro array. Well, Chris Hemedinger was quick to post a comment and point out that with SAS 9.3 there are a few improvements in the SELECT INTO range options.

Not only that but Chris highlighted his Improving on a SAS Programming Pattern post from March last year in which Chris showed the new SAS 9.3 syntax where we can now specify an open-ended range for macro variable names. Apparently, SAS will allocate as many macro variables as are needed to store each value, and the values will be TRIMMED of whitespace by default (but you can control this by specifying NOTRIM before the FROM keyword).

I've reproduced Chris's example below.

/* SAS 9.3 approach */ 
/* Create macro vars with values & count of distinct values */ 
proc sql noprint;
  select distinct TYPE into :varVal1- 
    from SASHELP.CARS; 
quit;
%let varCount = &SQLOBS.; quit; 

Thanks for the tip Chris.

Wednesday, 23 January 2013

NOTE: An Update on Graphics Procedures

When Phil Holland asked me (and others) to review his upcoming new book Power User's Guide to SAS Graph Templates, I gladly accepted: Phil is a long-standing friend of mine, I've always enjoyed programming with SAS/GRAPH procedures, and I haven't used the "new" ODS graphics procedures (known prior to V9.3 as the statistical graphics procedures) but I'm keen to learn. Little did I know what an eye-opener the exercise would turn out to be.

That's no criticism of Phil's book. It's most informative, and written in Phil's usual, facts-first style. The eye-opening element for me was how full-featured the ODS graphics procedures are. To a large degree they provide all of the facilities of the "old" graphics procedures (such as GPLOT and GCHART); and that left me concluding that I must up-skill myself to the new procedures. And it also left me wondering, with so much duplication between the old and the new procedures, what are SAS's plans for the old procedures. Can we expect any future enhancements to them, or are they functionally-stabilised.

I quickly tried-out PROC SGPLOT as a comparison with PROC GPLOT, with immediate success. I'll leave you to run the code below and make your own comparisons. However, I think you'll agree that the default SGPLOT output is more immediately presentable than the output from GPLOT.
proc sgplot data=sashelp.snacks;
  where product eq: 'B' and
        date between '1feb2004'd and '29feb2004'd;
  series x=date y=qtysold /group=product;
run;

proc gplot data=sashelp.snacks;
  where product eq: 'B' and
        date between '1feb2004'd and '29feb2004'd;
  plot qtysold*date=product;
run;quit;

I've not managed to find a SAS statement on the future of the old procedures, but the ODS Graphics: Procedures Guide does provide a comparison of features between old and new, which is most informative. In essence it explains that the old procedures are "device-based" and influenced by GOPTIONS, whilst the new procedures are ODS-aware and are influenced by the ODS GRAPHICS statement.

Rob Allison produced a neat web page paper that demonstrates old and new procedures alongside each other, and Phil presented a paper at SAS Global Forum 2011 on Why Should You Be Using the New SG (Statistical Graphics) Procedures in SAS 9.2? which answers its own question very well. I'll not go into any further detail in this article (I'm still learning myself), so I'll leave Phil to have the last word (with an extract from his SGF paper):

There is always a hope that a new software feature will improve on what already exists, but fear of the unknown stops you from trying it out, just in case it is a waste of time. My paper tries out the new SG (Statistical Graphics) procedures in SAS 9.2 for you, and demonstrates some of the new and useful features of PROC SGPLOT, PROC SGPANEL, PROC SGSCATTER and PROC SGRENDER.

While PROC GPLOT, and also PROC GCHART, can create acceptable graphs, their default settings are generally not what is required. The SG procedures employ default settings that are much closer to what is likely to be required, and adjustments are therefore easier to code and document.

PROC SGPLOT and PROC SGPANEL allow the production of combined graphs which are, at best, difficult to create using PROC GPLOT or PROC GCHART and Annotate. PROC SGPANEL and PROC SGSCATTER have the ability to create multiple graphs with common axes, which previously required many SAS/GRAPH procedure calls, careful template design, followed by the use of PROC GREPLAY to render the completed graph. For the ultimate control over any graph, ODS Graphics templates, which can include user-selected features, can be rendered using PROC SGRENDER to allow the simple reuse of tested graphics code.
Thanks Phil.

Tuesday, 22 January 2013

NOTE: Macro Arrays, Straight From a Data Set

I love the SAS macro language, and I do a lot of work in it. It's not uncommon to want to have an array of macro variables with values taken from a data set. Here's how I most-often do it...

Of course, macro language doesn't explicitly support arrays of variables, but we can achieve something usable by placing values into macro variables whose names follow a numeric sequence, e.g. NAME1, NAME2, NAME3, etc.

This can be achieved from PROC SQL, as this log shows:

17 proc sql noprint;
18   select name,age into :name1-:name999, :age1-:age999
19     from sashelp.class;
20 quit;
21 %put SQLOBS=&sqlobs, NAME1=&name1, NAME2=&name2, AGE1=&age1, AGE2=&age2;


SQLOBS=19, NAME1=Alfred, NAME2=Alice, AGE1=14, AGE2=13


See how we need to code an arbitrarily large number as the top index (999 in this case), unless we figure it out beforehand; and see how SQLOBS tells us how many members of the array have been populated. This is an effective and simple means of loading a macro array from a data set.

To use a macro array you'll most likely want to loop across it (or down it, depending on your predilection!); and you'll need a %do loop to do that; and you'll need to be within a macro definition to do that. The following code shows how to list-off the array's values. Note how we've saved the size of the array into NAME0 to avoid the danger of SQLOBS being subsequently over-written.

%macro NoteColon;
  proc sql noprint;
    select name,age into :name1-:name999 ,:age1-:age999
      from sashelp.class;
  quit;
  %put SQLOBS=&sqlobs, NAME1=&name1, NAME2=&name2, AGE1=&age1, AGE2=&age2;  

  %let NAME0=&sqlobs;

  %do i=1 %to &name0;
    %put I=&i, NAME&i=&&&name&i, AGE&i=&&&age&i;
  %end;
%mend NoteColon;

%NoteColon;


This produces the following log:

SQLOBS=19, NAME1=Alfred, NAME2=Alice, AGE1=14, AGE2=13
I=1, NAME1=Alfred, AGE1=14
I=2, NAME2=Alice, AGE2=13
I=3, NAME3=Barbara, AGE3=13
[snip]
I=18, NAME18=Thomas, AGE18=11
I=19, NAME19=William, AGE19=15


Macro arrays are a valuable tool, and it's always useful to find new ways to use them and/or to load them! If you haven't come across this particular use of PROC SQL's INTO expression then I hope you've found this post useful.

Monday, 21 January 2013

Wordnik, and Words Revisited

I came across a new web site today that piqued my interest in words and language. It's called Wordnik. It's like a dictionary (seemingly American definitions only), but it offers a lot more because it shows various uses of the word found around the internet - in lists, quotations and Twitter. My preferred dictionary web site remains Cambridge Dictionaries Online, but Wordnik certainly gets added to my list of useful sites.

I've written about such things before. I remain a big, big fan of Cricklers, and I still love the beauty of Visuwords. Attempting the daily Crickler is a great way to exercise your language and news skills (a kind of kata for words and news!). Paul Dunn's Visuwords is not only useful and functional, I just love the organic animation as new words appear.

We all have to communicate, it's part of our job. Like all other aspects of our professional life, we should strive to improve our skills and knowledge. This includes knowing how to use our writing and communication tools (such as email, word processor and slide presenter), plus increasing our knowledge of our own language (words, their meaning, and syntax).

As an exercise in the above, why not write a paper for your local SAS users group, or for SAS Global Forum?!

NOTE: I'll be Busy at SAS Global Forum! #sasgf13

I was very pleased to be invited to present a paper at this year's SAS Global Forum in San Francisco in April/May. To then have my contributed paper accepted too was icing on the cake. I don't yet know the dates and times where my two papers will be on the agenda, but it looks like I'll be busy this year.

Firstly, I was honoured to be invited to present "Visual Techniques for Problem Solving and Debugging" in the Reporting and Information Visualisation stream.
Abstract: No matter how well we plan, issues and bugs inevitably occur. Some are easily solved, but others are far more difficult and complex. This paper presents a range of largely visual techniques for understanding, investigating, solving, and monitoring your most difficult problems. Whether you have an intractable SAS coding bug or a repeatedly failing SAS server, this paper offers practical advice and concrete steps to get you to the bottom of the problem. Tools and techniques discussed in this paper include Ishikawa (fishbone) diagrams, sequence diagrams, tabular matrices, and mind maps.
And I had already submitted "Automated Testing of Your SAS Code and Collation of Results (Using Hash Tables)" into the Applications Development stream. It was subsequently accepted.
Abstract: Testing is an undeniably important part of the development process, but its multiple phases and approaches can be under-valued. I describe some of the principles I apply to the testing phases of my projects and then show some useful macros that I have developed to aid the re-use of tests and to collate their results automatically. Tests should be used time and again for regression testing. The collation of the results hinges on the use of hash tables, and the paper gives detail on the coding techniques employed. The small macro suite can be used for testing of SAS code written in a variety of tools including SAS Enterprise Guide, SAS Data Integration Studio, and the traditional SAS Display Manager Environment.
So, if you're attending SAS Global Forum this year, please stop by one or both of my papers, and be sure to say "hi"!

Sunday, 20 January 2013

More Routes to Get NOTE:

We're in the Kindle Store! You can now subscribe to the NOTE: blog via the Kindle Store. If you have a Kindle device (it's not available to Kindle apps) and you're in the UK or USA, you can subscribe from the addresses shown at the bottom of this post.

There's a small monthly price (£0.99 in the UK, $0.99 in USA) shared between Amazon and me. Of course, you can get every blog post for free from the web site, or from our RSS feed, or from our email subscriptions, so why would you pay? In a word: convenience.

Sure, you can subscribe to our RSS feed at no charge, but then you have to remember to visit your feed reader or RSS folders on Outlook to read the incoming posts. If you receive our feeds via email, it's just one more email in a crowded in-box. Many people subscribe to multiple blogs, which multiples the problem. But, if you get your blog posts delivered to your Kindle device then you can read them in their leisure time, away from your computer and email. This is especially attractive to people who travel or commute on public transportation. If you're out and about and carry your Kindle, no matter what time of day your favourite blog is updated, you can open your Kindle and read the latest entry.

Kindle version blogs include the full text of each blog post plus its images.

Remember, all Kindle blogs come with a 14-day free trial. Try it to see how you like it!

UK - http://www.amazon.co.uk/Note-Colon/dp/B00B35NR7O/ 

USA - http://www.amazon.com/gp/product/B00B35NR7O

And, write a comment and let us know what you think.

Monday, 14 January 2013

NOTE: The OPEN Function (error trapping)

In my previous two posts on this topic (listed at the bottom of this post), I've described how to use the OPEN function (and its counterparts) to read data sets from macro code, and to read data sets within a DATA step without using the SET/UPDATE/MODIFY statement. In this final post, I'll tidy-up a few loose-ends, including error-trapping.

Firstly, I haven't mentioned the FETCHOBS function. Whilst FETCH simply fetches the next row, FETCHOBS fetches the specified row number, e.g. RC=FETCHOBS(DSID,7) to fetch the seventh row.

Next, I'll mention the SYSMSG function which returns the warning/error message from the most recent data set function. There are a lot of reasons why your data set functions may not give you the results that you expect, so it's worth being aware of SYSMSG. While I'm developing my code, I use a defensive technique and place a PUT statement alongside my data set functions to show me all of the return codes as I progress with my code. Here's a log with a simple example:

17 data _null_;
18   dsid = open('SASHELP.CARS(where=(left(model) eq "XJR 4dr")))');
19     sm=sysmsg(); put DSID= SM=;

20   call set(dsid);
21   rc=fetch(dsid); sm=sysmsg(); put RC= SM=;

22   rc = close(dsid); sm=sysmsg(); put RC= SM=;
23 run;

dsid=1 sm=
rc=10005 sm=ERROR: ERROR: DATA step variable Make not defined..
rc=0 sm=


See how SYSMSG tells us that we haven't pre-defined the MAKE variable (because we've used CALL SET but haven't used a LENGTH statement nor a SET statement with a never-true IF statement).

I hope you've found interest in this mini-series on the OPEN function and its counterparts. Those who have knowledge of SCL programming with SAS/AF will already be familiar with these functions (and many more that haven't been supporetd in teh DATA step). They're not of use every day, but they're damn valuable when the right situation arises!

DATA SET FUNCTIONS:

1. NOTE: The OPEN Function (getting data set information in macros)
2. NOTE: The OPEN Function (reading data sets in macros)
3. NOTE: The OPEN Function (error trapping)

NOTE: Executing a PROC from a DATA Step, Revisited

Further to my post on Executing a PROC from a DATA Step, Quentin commented and drew my attention to a highly informative SAS Global Forum paper on the same subject by Mike Rhoads. I'm grateful to Quentin; Mike's paper offers a lot of insight and experience in this subject area.

Mike's paper is entitled Use the Full Power of SAS in Your Function-Style Macros and it describes how you can run a DATA step or a PROC within a user-written function which, in turn, can be called from within a macro statement. Mike describes the catchily-named Macro Function Sandwich technique, and provides a number of illustrations of how the Base and Macro languages can be extended with this capability.

I'm hoping to see even more papers on this subject area at this year's SAS Global Forum. I think it offers a lot on untapped potential.

Friday, 11 January 2013

NOTE: Are You Game for Big Data Overload?

Earlier this week, SAS launched a new software product. It's free! AND it runs on Android and Apple iOS devices!! What am I talking about?... the latest craze to hit analytics offices worldwide... Big Data Overload, the game.

This is a fun game in which you play the role of a business decision maker working in a high-paced environment where exploiting data is critical to the survival of your business. Unfortunately things are starting to get difficult. Up until now you have coped… but for how much longer? How long can you survive with manual processes? Can you exploit all of the data before it hits the Decision Horizon and its value is lost forever?

At some point you will reach Big Data Overload… That is where the SAS powered V-Cans can help. They are souped up and rearing to go with high performance analytics… they are ready to help you at any point. They do stuff humans can't do… they have the Power to Know!

Have fun… and when you've mastered the game download the SAS Mobile BI app from within the game and see how SAS can help you with real-world big data challenges.

Download on iPad and iPhone

Download on Google Play for Android

It's already gone down a storm with my colleagues in the office. Something for the weekend?...

Wednesday, 9 January 2013

NOTE: The OPEN Function (reading data sets in macros)

In my previous post on this subject, I showed how to use the DATA step's OPEN function to get information about data sets and make that information available within a DATA step or as a macro variable. In today's post I will take it further and show how to retrieve rows and columns of information in a) a DATA step, independant of SET/MODIFY/UPDATE statements, and b) macro code.

Here's a simple sample:

data _null_;
                    /* Open the data set */
  dsid = open('SASHELP.CARS(where=(left(model) eq "XJR 4dr")))');
  rc=fetch(dsid); /* Fetch the first row */
  msrp_vn = varnum(dsid,'MSRP'); /* Get identifier for MRSP (price) column */
  msrp = getvarn(dsid,msrp_vn); /* Get the price value */
  rc = close(dsid); /* Close the data set */
  put _all_;
run;


In the log, we see:

dsid=1 rc=0 msrp_vn=6 msrp=74995 _ERROR_=0 _N_=1

We've used the FETCH, VARNUM and GETVARN functions in addition to OPEN and CLOSE that I showed in my previous post.

By default, the values from the opened data set do not become available for use in the DATA step. We need to use FETCH to make the first row accessible (qualified by the WHERE clause), and we need to use GETVARN to get the value of the specified column from the current row. However, GETVARN doesn't allow us to pass the name of the column, we have to get an ID for it from the VARNUM function and then pass that ID to GETVARN. It's a bit convoluted, but it works! Use GETVARN to get values from numeric columns; use GETVARC for character columns.

Here's how it looks in macro code:

27 %let dsid = %sysfunc(open(SASHELP.CARS(where=(left(model) eq "XJR 4dr"))));
28 %let rc=%sysfunc(fetch(&dsid));
29 %let msrp_vn = %sysfunc(varnum(&dsid,MSRP));
30 %let msrp = %sysfunc(getvarn(&dsid,&msrp_vn));
31 %let rc = %sysfunc(close(&dsid));
32 %put DSID=&dsid MSRP_VN=&msrp_vn MSRP=&msrp;
DSID=4 MSRP_VN=6 MSRP=74995


This works nicely, but if we have a large number of columns, it would involve an equally large number of calls to VARNUM and GETVARN/C. An alternative is to use CALL SET to get all columns immediately available. Here's the macro code:

17 %let dsid = %sysfunc(open(SASHELP.CARS(where=(left(model) eq "XJR 4dr"))));
18 %syscall set(dsid);

19 %let rc=%sysfunc(fetch(&dsid));
20 %let rc = %sysfunc(close(&dsid));
21 %put DSID=&dsid MSRP=&msrp ORIGIN=&origin ENGINESIZE=&enginesize;
DSID=4 MSRP=74995 ORIGIN=Europe ENGINESIZE=4.2


Now, that's a lot simpler!

However, when we use the same technique in DATA step, we must also declare *all* of the data set's variables (with a LENGTH statement, for example), else the FETCH function will fail. Even for a relatively simple data set like sashelp.cars that's a lot of "unnecessary" length infromation that we must add to our DATA step (and maintain it if the data set has its attributes chnaged subsequently). Here's trick to avoid this hassle. Add this statement in lieu of a long LENGTH statement: if 0 then set sashelp.cars;. You'll also need to add a STOP at the end of the DATA step.

The never-true IF statement won't execute the SET statement at run-time, and hence no observations will be read from sashelp.cars, but prior to the execution of teh DATA step, the compilation phase will automaticlly all of the data set's volumns to the PDV. Hey presto! Here's the DATA step code:

17 data _null_;
18   if 0 then set sashelp.cars;

19   dsid = open('SASHELP.CARS(where=(left(model) eq "XJR 4dr")))');
20   call set(dsid);

21   rc=fetch(dsid);
22   rc = close(dsid);
23   put _all_;
24   STOP;

25 run;

Make=Jaguar Model=XJR 4dr Type=Sedan Origin=Europe DriveTrain=Rear MSRP=$74,995 Invoice=$68,306 EngineSize=4.2 Cylinders=8 Horsepower=390 MPG_City=17 MPG_Highway=24 Weight=3948 Wheelbase=119 Length=200 dsid=1 rc=0 _ERROR_=0 _N_=1


So, there we have it, a means of reading a data set in macro code, and a means of referencing data set values in a DATA step without using a SET/UPDATE/MODIFY statement.

Further reading: William C Murphy wrote an excellent SAS Global Forum paper on using %SYSCALLL SET back in 2007. William showed how to load

In my next (final) post on this topic, I'll highlight a few more functions related to OPEN, and I'll discuss error-trapping.

DATA SET FUNCTIONS:

1. NOTE: The OPEN Function (getting data set information in macros)
2. NOTE: The OPEN Function (reading data sets in macros)
3. NOTE: The OPEN Function (error trapping)

NOTE: Executing a PROC from a DATA Step

I stumbled across a most informative SAS Global Forum paper the other day, and I wanted to share it with you! In his Executing a PROC from a DATA Step paper, SAS's Jason Secosky describes the RUN_MACRO and DOSUBL functions, both of which were new to me. RUN_MACRO and DOSUBL enable DATA step code to immediately execute a macro and act on the resulting output data set.

RUN_MACRO was introduced in SAS V9.2, and DOSUBL experimentally in V9.3.

The RUN_MACRO function executes a macro and waits for it to complete. RUN_MACRO can be called only from a user-written function created with the FCMP procedure. As a result, you have to write three blocks of code to execute a PROC from a DATA step: Macro to execute a PROC; User-written function to execute the macro; DATA step to call the user-written function. This can be a chore, but can be worth it in the right circumstances.

DOSUBL executes SAS code directly from a DATA step. Unlike RUN_MACRO, DOSUBL can be called directly from a DATA step without the need for a user-written function.

If you're into advanced SAS coding, you need to know about RUN_MACRO and DOSUBL. They probably don't allow you to do anything you couldn't do without them, but they sure make your code a lot simpler and easier to maintain.

Monday, 7 January 2013

NOTE: The OPEN Function (getting data set information in macros)

There are a number of ways of accessing the content of SAS data sets. DATA steps and PROCs are two obvious means. Another method is to use the DATA step's OPEN function.

Let's start with some basics. Here's a DATA step that is able to establish whether a specific data set has an index...

data _null_;
  dsid = open('SASHELP.CLASS');     /* Open the data set */
  HasIndex = attrn(dsid,'ISINDEX'); /* Get num attr */
  rc = close(dsid);                 /* Close the data set */
  put _all_;
run;


The following is returned in the log:

dsid=1 HasIndex=0 rc=0 _ERROR_=0 _N_=1

In essence, the OPEN function provides a numeric identifier for the data set in question; the identifier can then be passed to a myriad of functions (such at ATTRN) to "do stuff" with the data set; the identifier should then be returned back to SAS with the CLOSE function when you've finished with the data set (in order to close it and free-up memory). The OPEN function (and its associated functions) can be used in any DATA step, regardless of whether one or more other data sets are already open with a SET statement, for example.

The functionality provided by ATTRN is simple, but with its cousin ATTRC it provides a large number of different attributes. There are many more associated functions that use the data set identifier returned by the OPEN function. Some examples are: VARNUM, GETVARC/N, FETCH, FETCHOBS, and CALL SET. I'll talk about these in subsequent posts.

We're not restricted to using these DATA step functions within a DATA step. We can use them within macro code, or a PROC FCMP function. Their use within macro code can be particularly valuable. For instance, to get the value of a column from a specific row in a table and place it into a macro variable, we could use a DATA step thus:

%let dsid = %sysfunc(open(SASHELP.CLASS)); /* Open the ds */
%let HasIndex = %sysfunc(attrn(&dsid,ISINDEX)); /* Get attr */
%let rc = %sysfunc(close(&dsid)); /* Close the data set */
%put HasIndex is set to: &HasIndex;

And the resulting log shows:

HasIndex is set to: 0

It possibly looks more complicated than the DATA step alternative, but it has advantages too. We didn't need to run PROC CONTENTS, get the results into a data set, and then read the data set with a DATA step. And we didn't need to read SASHELP.VMEMBER, which can be resource-intensive.

Note that we've used macro statements, but they didn't need to be within a macro definition.

I'll show some more advanced uses in subsequent posts.

DATA SET FUNCTIONS:

1. NOTE: The OPEN Function (getting data set information in macros)
2. NOTE: The OPEN Function (reading data sets in macros)
3. NOTE: The OPEN Function (error trapping)

Wednesday, 2 January 2013

My 2012 Top Ten

Last week I published the ten blog articles that scored most hits over the last 18 months. That was "your" top ten. I thought I'd offer an alternative top ten by listing the articles that I most enjoyed writing or which I personally thought were of the greatest significance. Pure indulgence; please forgive me!

In no particular order...
  • NOTE: Explaining Precision, 2-Apr-2012. The subject of numeric precision crops up from time-to-time, so I was pleased to discover an additional source on the topic
  • NOTE: Programming for Job Security (Golden Oldies), 10-Jan-2012. This was part of a Christmas series of "golden oldies" where I highlighted conferences papers from years past. This particular article highlighted a tongue-in-check, apply-the-reverse set of guidelines for making your SAS code unreadable for other coders. There's not much humour in SAS documentation and papers; the highlighted paper is a memorable exception
  • NOTE: When WORK is not WORK, 11-Aug-2012. Explaining the USER option and its effect on the WORK library
  • NOTE: More on Ishikawa, 12-Nov-2012. Ishikawa diagrams are an excellent means of investigating complex problems. I wrote an article on their use, and Chris Brooks subsequently pointed-out to me that SAS can draw such diagrams with PROC ISHIKAWA. Read the original article to get the skinny on Ishikawa diagrams
  • Mutation Testing, 17-Oct-2012. I'm always keen to find new ideas. Mutation testing was a new concept for me, but I think it has some merits. Read the article and judge for yourself
  • Code Kata #1, 18-Jan-2012. Having introduced the idea of coding challenges ("katas") as a coders' equivalent of a work-out, I offered a specific example/challenge. I got quite a lot of feedback on this, so I'll aim to produce some more in 2013
  • Whatever You Call It, It's About People First, 18-Sep-2012. People, Processes and Technology. In that order. That's the order of importance: no project or strategy can focus on technology alone
  • Technical Debt, 23-Oct-2012. A very useful concept for capturing your tactical short-cuts
  • NOTE: Prompts, Beyond the Basics With Enterprise Guide, 25-Sep-2012. There's so much you can do in Enterprise Guide beyond the creation of process flows and editing of programs. Prompts are just one of the valuable features of Enterprise Guide that I featured in this series
  • Requirements. Whose Responsibility? 30-Apr-2012. I'm a keen believer in cooperation and team work. In this article I spent some time discussing requirements capture; whether it should be done by users or by IT
2012 was a great year, with The Queen's Diamond Jubilee swiftly followed by the London 2012 Olympics and Paralympics. Can 2013 match it? Let's hope so...

One of the highlights of 2013 will be Bloodhound SSC's attempt to become the first vehicle to exceed 1,000 mph on land. The car weighs over 7 tonnes, but the engines (one rocket and one jet) produce more than 135,000 horsepower. With key team members from the Thrust SSC project that broke the speed of sound and holds the current record of 763 mph, this project stands every chance of success. Good luck!