Wednesday, 30 September 2009

NOTE: Sparklines Improve Your Communication Abilities

In issue 18 of NOTE:, published in 2006, we talked about Edward Tufte's concept of sparklines. Since then there's been more published material on SAS and sparklines, most notably Paul OldenKamp's SGF 2008 paper entitled "An Interpretation of Sparkline Graphs and Their Creation with SAS® Software". Paul very kindly refers to issue 18 of NOTE:.

Sparklines are a tremendously rich means of communicating information in a small space. BI tools such as QlikView now include sparklines as a standard graphical object.

Here's our original article:

Edward Tufte is a world renowned expert on information graphics, i.e. the science of presenting information in a graphical format. In his recent publication named Beautiful Evidence, Mr Tufte formally introduced the concept of sparklines - small, high resolution graphics embedded in a context of words, numbers, or images. You can read large parts of a draft of the sparklines section of Mr Tufte's book in the discussion thread he started on his site in 2004.

As illustrated in Mr Tufte's book, sparklines are an extremely powerful means of communicating information. I think they're at their most powerful when used within a paragraph of text, almost as if they were a word. For example, we had some very hot weather earlier this month, but it's now reduced to a more comfortable level, as you can see: Example sparkline. The sparkline neatly conveys all of the information without interrupting the flow or layout of the text. There are many variations on the sparklines theme, all of which are discussed in Beautiful Evidence.

If you want to experiment with using sparklines, you might like to try BitWorking's sparkline generator. It's a neat and simple web-based means of getting a sparkline for your data. Alternatively, if you visit Bissantz's page on sparklines, you'll see that they produce SparkMaker (an add-in for Microsoft Office that lets you create your own sparklines in Word, Excel, PowerPoint, or HTML documents) and SparkFonts (TrueType Fonts for the character-oriented generation of sparklines). And finally, there are plenty of macros and add-ins for producing sparklines within Excel - just use your favourite search engine..

However, as a SAS practitioner, I'm sure you're thinking to yourself "I'll bet SAS/GRAPH can do sparklines neatly", and you'd be right of course! The following basic macro, and example invocation, produces a very effective sparkline:

%macro spark(data= /* Name of input data set */
            ,hpixels=50 /* Length of spark line (pixels) */
            ,vpixels=11 /* Height of spark line (pixels) */
            ,xvar= /* Variable to plot on X-axis */
            ,yvar= /* Variable to plot on Y-axis */
            ,gdevice=gif /* Output graphics device */
            ,cmin= /* Colour of symbol displayed at minimum data point */
            ,cmax= /* Colour of symbol displayed at maximum data point */
            ,clast= /* Colour of symbol displayed at last data point */
            ,hmin=1 /* Height of symbol displayed at minimum data point */
            ,hmax=1 /* Height of symbol displayed at maximum data point */
            ,hlast=1 /* Height of symbol displayed at last data point */
            ,vmin=dot /* Symbol displayed at minimum data point */
            ,vmax=dot /* Symbol displayed at maximum data point */
            ,vlast=dot /* Symbol displayed at last data point */
                      /* Name and location of output graphics file */
            ,outfile=%sysfunc(pathname(WORK))/spark.&gdevice
            ,tidy=y /* Delete temporary data sets prior to termination? */
            );

%put **********************************************************************;
%put Parameter values to be used by &sysmacroname are:;
%put _local_;
%put **********************************************************************;

/***********
/ Notes.
/ 1. The names of all temporary data sets are prefixed with the name of this
/ macro (&sysmacroname).
/ 2. All temporary data sets are deleted prior to termination of this macro
/ (conditional upon the value of the TIDY parameter).
/ 3. Expect the following messages:
/ NOTE: [nnn] observation(s) contained a MISSING value for the minyvar * [yvar] request.
/ NOTE: [nnn] observation(s) contained a MISSING value for the maxyvar * [yvar] request.
/ NOTE: [nnn] observation(s) contained a MISSING value for the lastair * [yvar] request.
/
************/

/************
/ STEP 1. Put minimum/maximum points into separate data
/ sets (one row in each). This is done regardless
/ of whether the information is actually required.
************/
data _&sysmacroname._min (keep=xvar4minyvar minyvar)
     _&sysmacroname._max (keep=xvar4maxyvar maxyvar)
     ;
  set &data end=finish;

  retain minxvar maxxvar;
  if _n_ eq 1 or &xvar lt minxvar then
  do;
    minxvar = &xvar;
  end;
  if _n_ eq 1 or &xvar gt maxxvar then
  do;
    maxxvar = &xvar;
  end;

  retain xvar4minyvar minyvar;
  if _n_ eq 1 or &yvar lt minyvar then
  do;
    xvar4minyvar = &xvar;
    minyvar = &yvar;
  end;

  retain xvar4maxyvar maxyvar;
  if _n_ eq 1 or &yvar gt maxyvar then
  do;
    xvar4maxyvar = &xvar;
    maxyvar = &yvar;
  end;

  if finish then
  do; /* Save and then write-out the values for informational purposes */
    OUTPUT;
    call symput('MINXVAR',compress(minxvar,'BEST.'));
    call symput('MAXXVAR',compress(maxxvar,'BEST.'));
    call symput('MINYVAR',compress(minyvar,'BEST.'));
    call symput('MAXYVAR',compress(maxyvar,'BEST.'));
  end;
run;
%put &sysmacroname: MINXVAR=&minxvar, MAXXVAR=&maxxvar;
%put &sysmacroname: MINYVAR=&minyvar, MAXYVAR=&maxyvar;

/************
/ STEP 2. Merge the min and max information with the actual
/ plot data.
/ Output data set will contain five columns: the
/ X and Y variables, plus variables for min, max,
/ and last.
/ This is all done regardless of whether the
/ information is actually required.
************/
data _&sysmacroname._data_minmaxlast;
  merge &data (keep=&xvar &yvar)
        _&sysmacroname._min(rename=(xvar4minyvar=&xvar))
        _&sysmacroname._max(rename=(xvar4maxyvar=&xvar))
        end=finish
        ;
  by &xvar;
  if finish then
    last&yvar=&yvar;
run;

/************
/ STEP 3. Are any specific points actually required? This
/ is interpreted from the fact that no colour
/ was specified.
/ For those that are not required, the plot
/ symbol is set to NONE.
************/
%if %length(&cmin) eq 0 %then %let vmin = NONE;
%if %length(&cmax) eq 0 %then %let vmax = NONE;
%if %length(&clast) eq 0 %then %let vlast = NONE;

/************
/ STEP 4. Produce the plot.
/ Need to set goptions, then, specify name and
/ location of output graphics file, then specify
/ invisible axes, then specify symbols for the
/ line (always visible) and the min/max/last points
/ (conditionally visible).
************/
goptions reset=all
         device=&gdevice
         hsize=&hpixels.pt vsize=&vpixels.pt
         gaccess=gsasfile
         ;
filename gsasfile "&outfile";
axis1 order=(&minxvar &maxxvar) label=none value=none major=none minor=none color=white;
axis2 order=(&minyvar &maxyvar) label=none value=none major=none minor=none color=white;
symbol1 c=black i=join v=none ;
symbol2 c=&cmin h=&hmin v=&vmin;
symbol3 c=&cmax h=&hmax v=&vmax;
symbol4 c=&clast h=&hlast v=&vlast;
proc gplot data=_&sysmacroname._data_minmaxlast;
  plot (&yvar minyvar maxyvar last&yvar) * &xvar / overlay
       noframe
       haxis=axis1
       vaxis=axis2
       ;
run; quit;
filename gsasfile clear;

/************
/ STEP 5. Conditionally delete temporary data sets.
************/
%if %upcase(%substr(&tidy,1,1)) eq Y %then
%do;
  proc datasets lib=work nolist;
  delete _&sysmacroname._:;
  quit;
%end;

%mend spark;

options mprint;
%spark(data=sashelp.air
      ,xvar=date
      ,yvar=air
      ,cmin=red
      ,cmax=lime
      ,clast=orange
      ,outfile=c:\temp\spark.&gdevice
      ,tidy=n
      );


We'd love to see your own code and sparklines. Please send them to NoteEditor@RTSL.eu.