The following expresses my understanding of your situation, based upon your original posting:

  • You have multiple, tab-delimited files
  • The first line of each file contains column headers
  • Each file may have a different number of columns
  • The first column of each file is the ID, so can be discarded
  • You want to generate a file for each column (beyond the ID column), for each of the tab-delimited files

If my understanding is correct, the following--which uses a hash of arrays (HoA)--provides one solution:

use strict; use warnings; my ( @header, %hash ); my @files = qw/File1.txt File2.txt/; local $, = "\n"; for my $file (@files) { open my $fhIN, '<', $file or die $!; while ( my $line = <$fhIN> ) { my @columns = split ' ', $line; if ( $. == 1 ) { @header = @columns; } else { push @{ $hash{ $header[$_] } }, $columns[$_] for 1 .. $#co +lumns; } } close $fhIN; for my $i ( 1 .. $#header ) { open my $fhOUT, '>', "$file\_$header[$i].txt" or die $!; print $fhOUT @{ $hash{ $header[$i] } }; close $fhOUT; } undef %hash; }

Each file in @files is processed. If it's the first line of the file ($. contains the current line number), then it's the header line, and is saved.

For lines 2 .. n, you'll note the following:

push @{ $hash{ $header[$_] } }, $columns[$_] for 1 .. $#columns; ^ ^ ^ ^ ^ | | | | | | | | | + - To the last + index | | | + - Starting at the +next index after the index for ID | | + - Column value | + - Column heading + - Generate a HoA, where: keys are column names; values are ref +erences to lists of column entries

The for my $i ( 1 .. $#header ) { ... iterates through all of the column headers, except the ID, using them as keys to access the array of column values.

A file is created for each column. The naming scheme is the file's name plus the column's name. (If you didn't want files created, it's within this for loop that you can operate on the columns' values.)

You may have noticed the earlier local $, = "\n"; notation. This will cause a newline to be placed between the array's elements when printed, so each element will be on its own line. To use a different format, just set the value of $, to something else, e.g., "," or "\t".

Potential issues:

  • Columns may have the same heading (although this is highly unlikely)
  • The heading may contain 'illegal' file characters. If this is possible, a substitution can be performed on them to eliminate the 'offending' characters

Hope this helps!

