Wednesday, 22 June 2011

NOTE: SYSTASK With An Unknown Number of Calls

In an earlier article (and the associated article on security) I extolled the virtues of SYSTASK for doing operating system activities in parallel. I gave an example that executed two gzip commands in parallel. But what would you do if you didn't know how many files you needed to zip?

Well, let's assume you have a table containing a list of files (WORK.FILES in the example below); we need to issue a SYSTASK statement for each row in the table; and then we need to issue a WAITFOR statement that refers to the names of each of the SYSTASKs so that we don't proceed any further until all of the zips are complete.

data files;
  file='Alpha.csv'; output;
  file='Beta.csv'; output;
  file='Gamma.csv'; output;
run;

%macro zippem(data=,var=);
  data _null_;
    set &data end=finish nobs=numobs;
    length stmt $256;
    stmt = cat('systask command "gzip '
              ,&var
              ,'" nowait taskname=TSK'
              ,putn(_n_,'Z5.')
              ,';'
              );
    call execute(stmt);
    if finish then
    do;
      stmt = 'waitfor _all_';
      do i = 1 to numobs;
        stmt = cat(trim(stmt),' TSK',putn(i,'Z5.'));
      end;
      stmt = cat(trim(stmt),';');
      call execute(stmt);
    end;
  run;
%mend zippem;

%zippem(data=files,var=file);


The macro produces the following log output:

NOTE: CALL EXECUTE generated line.
1 + systask command "gzip Alpha.csv" nowait taskname=TSK00001;
2 + systask command "gzip Beta.csv " nowait taskname=TSK00002;
NOTE: LOG/Output from task "TSK00001"
> gzip: Alpha.csv: No such file or directory
NOTE: End of LOG/Output from task "TSK00001"
3 + systask command "gzip Gamma.csv" nowait taskname=TSK00003;
4 + waitfor _all_ TSK00001 TSK00002 TSK00003;
NOTE: LOG/Output from task "TSK00003"
> gzip: Gamma.csv: No such file or directory
NOTE: End of LOG/Output from task "TSK00003"
NOTE: LOG/Output from task "TSK00002"
> gzip: Beta.csv: No such file or directory
NOTE: End of LOG/Output from task "TSK00002"


Ignoring the fact that my files don't exist(!), you can see that the output from each command is echoed to the log (useful). It's a simple macro, but it can speed-up your jobs by a significant amount. You can use the template code shown above for many purposes.