Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

The following expresses my understanding of your situation, based upon your original posting:

  • You have multiple, tab-delimited files
  • The first line of each file contains column headers
  • Each file may have a different number of columns
  • The first column of each file is the ID, so can be discarded
  • You want to generate a file for each column (beyond the ID column), for each of the tab-delimited files

If my understanding is correct, the following--which uses a hash of arrays (HoA)--provides one solution:

use strict; use warnings; my ( @header, %hash ); my @files = qw/File1.txt File2.txt/; local $, = "\n"; for my $file (@files) { open my $fhIN, '<', $file or die $!; while ( my $line = <$fhIN> ) { my @columns = split ' ', $line; if ( $. == 1 ) { @header = @columns; } else { push @{ $hash{ $header[$_] } }, $columns[$_] for 1 .. $#co +lumns; } } close $fhIN; for my $i ( 1 .. $#header ) { open my $fhOUT, '>', "$file\_$header[$i].txt" or die $!; print $fhOUT @{ $hash{ $header[$i] } }; close $fhOUT; } undef %hash; }

Each file in @files is processed. If it's the first line of the file ($. contains the current line number), then it's the header line, and is saved.

For lines 2 .. n, you'll note the following:

push @{ $hash{ $header[$_] } }, $columns[$_] for 1 .. $#columns; ^ ^ ^ ^ ^ | | | | | | | | | + - To the last + index | | | + - Starting at the +next index after the index for ID | | + - Column value | + - Column heading + - Generate a HoA, where: keys are column names; values are ref +erences to lists of column entries

The for my $i ( 1 .. $#header ) { ... iterates through all of the column headers, except the ID, using them as keys to access the array of column values.

A file is created for each column. The naming scheme is the file's name plus the column's name. (If you didn't want files created, it's within this for loop that you can operate on the columns' values.)

You may have noticed the earlier local $, = "\n"; notation. This will cause a newline to be placed between the array's elements when printed, so each element will be on its own line. To use a different format, just set the value of $, to something else, e.g., "," or "\t".

Potential issues:

  • Columns may have the same heading (although this is highly unlikely)
  • The heading may contain 'illegal' file characters. If this is possible, a substitution can be performed on them to eliminate the 'offending' characters

Hope this helps!

In reply to Re: How to add column into array from delimited tab file by Kenosis
in thread How to add column into array from delimited tab file by hellohello1

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or or How to display code and escape characters are good places to start.
Log In?

What's my password?
Create A New User
Domain Nodelet?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (7)
As of 2021-10-21 14:14 GMT
Find Nodes?
    Voting Booth?
    My first memorable Perl project was:

    Results (83 votes). Check out past polls.