The following expresses my understanding of your situation, based upon your original posting:
- You have multiple, tab-delimited files
- The first line of each file contains column headers
- Each file may have a different number of columns
- The first column of each file is the ID, so can be discarded
- You want to generate a file for each column (beyond the ID column), for each of the tab-delimited files
If my understanding is correct, the following--which uses a hash of arrays (HoA)--provides one solution:
use strict;
use warnings;
my ( @header, %hash );
my @files = qw/File1.txt File2.txt/;
local $, = "\n";
for my $file (@files) {
open my $fhIN, '<', $file or die $!;
while ( my $line = <$fhIN> ) {
my @columns = split ' ', $line;
if ( $. == 1 ) {
@header = @columns;
}
else {
push @{ $hash{ $header[$_] } }, $columns[$_] for 1 .. $#co
+lumns;
}
}
close $fhIN;
for my $i ( 1 .. $#header ) {
open my $fhOUT, '>', "$file\_$header[$i].txt" or die $!;
print $fhOUT @{ $hash{ $header[$i] } };
close $fhOUT;
}
undef %hash;
}
Each file in @files is processed. If it's the first line of the file ($. contains the current line number), then it's the header line, and is saved.
For lines 2 .. n, you'll note the following:
push @{ $hash{ $header[$_] } }, $columns[$_] for 1 .. $#columns;
^ ^ ^ ^ ^
| | | | |
| | | | + - To the last
+ index
| | | + - Starting at the
+next index after the index for ID
| | + - Column value
| + - Column heading
+ - Generate a HoA, where: keys are column names; values are ref
+erences to lists of column entries
The for my $i ( 1 .. $#header ) { ... iterates through all of the column headers, except the ID, using them as keys to access the array of column values.
A file is created for each column. The naming scheme is the file's name plus the column's name. (If you didn't want files created, it's within this for loop that you can operate on the columns' values.)
You may have noticed the earlier local $, = "\n"; notation. This will cause a newline to be placed between the array's elements when printed, so each element will be on its own line. To use a different format, just set the value of $, to something else, e.g., "," or "\t".
Potential issues:
- Columns may have the same heading (although this is highly unlikely)
- The heading may contain 'illegal' file characters. If this is possible, a substitution can be performed on them to eliminate the 'offending' characters
Hope this helps!
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.