Monday, 6 June 2011

NOTE: Reading Multiple Files (with irregular names)

I was introduced to the INFILE statement's FILEVAR parameter recently. It seems it's a great way to read multiple files into a DATA step. Hitherto I had tended to use a widlcard in the FILEREF.

To read multiple files with similar names, you can simply put a wildcard in the FILENAME statement thus:

filename demo '~ratcliab/root*.txt';

If I have files with the following names in my home directory:

root1.txt
root2.sas
root3.txt


...then the first and third will be read by the following DATA step:

17 filename demo '~ratcliab/root*.txt';
18
19 data;
20   length string $256;
21   infile demo;
22   input string $256.;
23 run;

NOTE: The infile DEMO is:
File Name=/home/ratcliab/root1.txt,
File List=/home/ratcliab/root*.txt,
Access Permission=rw-r--r--,
File Size (bytes)=10

NOTE: The infile DEMO is:
File Name=/home/ratcliab/root3.txt,
File List=/home/ratcliab/root*.txt,
Access Permission=rw-r--r--,
File Size (bytes)=10

NOTE: 1 record was read from the infile DEMO.
The minimum record length was 9.
The maximum record length was 9.
NOTE: 1 record was read from the infile DEMO.
The minimum record length was 9.
The maximum record length was 9.
NOTE: SAS went to a new line when INPUT statement reached past the end of a line.
NOTE: The data set WORK.DATA1 has 1 observations and 1 variables.


That's all great if your files have similar names. If not, ask the FILEVAR parameter to step forward...

The whole subject of using FILEVAR (and other techniques for reading multiple files) is covered very well in SAS's TS-581 so I won't duplicate the content here. However, briefly, the FILEVAR parameter specifies the name of a variable that contains the name of the file to be read. The content of the variable can be derived from code, or can be read from a data set or from cards (as shown below).

data one;

  infile datalines;
  length fil2read $40;
  input fil2read $;  /* Read the name of a file */

  /* Prepare to read from the specified file */
  infile dummy filevar=fil2read end=done;

  /* Read (and write) all of the rows from the file */
  do while(not done);
    input var1 var2 var3;
    output;
  end;

  datalines;
/user/data/input_file_01
/user/data/input_file_02
/user/data/input_file_03
/user/data/input_file_04
;
run;

Worth knowing!