Wednesday, 8 January 2014

NOTE: Making Sense of MCOVERAGE for Coverage Testing of Your Macros

Over the last couple of days I've been uncovering the MCOVERAGE system option for coverage of testing of macro code. Coverage testing shows which lines were executed by your tests (and which were not). Clearly, knowing the percentage of code lines that were executed by your test suite is an important measure of your coding efforts.

Yesterday we saw what the mcoverage contained for a typical execution of a macro. What we would like to do is make the information more presentable. That's what we'll do today. We'll produce some code that will output the following summary (from which, we can determine that 33% of our code lines weren't executed by our test).

recordnum record rectype analysis
1 %macro fred(param=2); 2 Used
2 * comment ; 2 Used
3 %put hello world: &param; 2 Used
4 %if 1 eq 1 %then %put TRUE; 2 Used
5 %if 1 eq 1 %then 2 Used
6 %do; 2 Used
7 %put SO TRUE; 2 Used
8 %end; 2 Used
9 %if 1 eq 0 %then 2 Used
10 %do; . NOT used!
11 %put FALSE; . NOT used!
12 %end; . NOT used!
13 %if &param eq 1 %then 2 Used
14 %do; . NOT used!
15 %put FOUND ME; . NOT used!
16 %end; . NOT used!
17 3 Not compiled
18 %mend fred; 2 Used

To create this table, we need to read the mcoverage log and the macro source for %fred as follows:

  • We need to process the mcoverage log by reading it into a data set and i) removing record type 1 (because it has no part to play in the above table, and ii) removing duplicated log rows for the same code line (which happens when a line of code is executed more than once).
  • We need to process the macro source by reading it into a data set and adding a column to record the line number (matching the numbers used in the coverage log).
  • Having read both files into separate data sets (and processed them as outlined above), we can join them and produce our report. The code to achieve this is shown at the end of this post.

The code that I've created expects a coverage log file from one execution of one macro. It cannot cope reliably with log files containing either multiple executions of the same macro or executions of more than one different macro. Is this a problem? Well, multiple executions of the same macro might be required if testing various permutations of inputs (parameters and data); and multiple different macros might be required if testing a suite of macros.

Tomorrow I'll augment the code so that it can deal with multiple executions of the same macro, e.g. testing %fred with param=2 and param=1.

Meanwhile, here's today's code...

/* This code will not cope reliably if the macro    */
/* source does not have a line beginning with the   */
/* %macro statement for the macro under inspection. */

/* This code expects a coverage log file from ONE */
/* execution of ONE macro. It cannot cope         */
/* reliably with log files containing either      */
/* multiple executions of the same macro or       */
/* executions of more than one different macro.   */

filename MClog "~/mcoverage1.log"; /* The coverage log file (MCOVERAGELOC=) */
filename MacSrc "~/fred.sas";     /* The macro source  */

/* Go get the coverage file. Create macro */
/* var NUMLINES with number of lines      */
/* specified in type 1 record.            */
data LogFile;
  length macname $32;
  keep macname start end rectype;
  infile MClog;
  input rectype start end macname $;
  prevmac = lag(macname);

  if _n_ ne 1 and prevmac ne macname then
    put "ERR" "OR: Can only process one macro";

  if rectype eq 1 then
    call symputx('NUMLINES',end);

  if rectype ne 1 and start ne end then
    put "ERR" "OR: Not capable of handling START <> END";
run;

%put NUMLINES=&numlines;

/* Remove duplicates by sorting START with NODUPKEY. */
/* Hence we have no more than one data set row per   */
/* line of code.                                     */

  /* This assumes the log file did not contain different 
     RECTYPEs for the same start number */
  /* This assumes log file does not contain differing
     permutations of START and END */
proc sort data=LogFile out=LogFileProcessed NODUPKEY;
  where rectype ne 1;
  by start;
run;

/* Go get macro source and add a line number value that */
/* starts at the %macro statement (because this is how  */
/* MCOVERAGE refers to lines.                           */
/* Restrict number of lines stored to the number we got */
/* from the coverage log file.                          */
data MacroSource;
  length record $132;
  retain FoundStart 0 
    LastLine 
    recordnum 0;
  keep record recordnum;
  infile MacSrc pad;
  input record $132.;

  if not FoundStart and upcase(left(record)) eq: '%MACRO' then
    do;
      FoundStart = 1;
      LastLine = _n_ + &NumLines - 1;
    end;

  if FoundStart then
    recordnum + 1;

  if FoundStart and _n_ le LastLine then
    OUTPUT;
run;

/* Bring it all together by marking each line of code */
/* with the record type from the coverage log.        */
proc sql;
  select  code.recordnum
    ,code.record
    ,log.rectype
    ,case log.rectype
       when 2 then "Used"
       when 3 then "Not compiled"
       when . then "NOT used!"
       else "UNEXPECTED record type!!"
     end as analysis
  from MacroSource code left join LogFileProcessed log
    on code.recordnum eq log.start;
quit;

filename MacSrc clear;
filename MClog clear;

As an endnote, I should explain my personal/idiosyncratic coding style:
  • I want to be able to search the log and find "ERROR" only if errors have occurred. But if I code put "ERROR: message"; then I will always find "ERROR" when I search the log (because my source code will be echoed to the log). By coding put "ERR" "OR: message"; my code looks a little odd but I can be sure that "ERROR" gets written to the log only if an error has occured