Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

I need help joining tab-delimited files/tables!

by nabiana (Initiate)
on Oct 19, 2011 at 21:43 UTC ( #932512=perlquestion: print w/ replies, xml ) Need Help??
nabiana has asked for the wisdom of the Perl Monks concerning the following question:

Hi, just joined minutes ago. I am trying join tab-delimited files into a single file/table

Example, say I have these 4 files/tables:

ID value (table1)

Aa 22

Bb 28

Cc 32

Dd 50

ID value (table2)

Aa 34

Cc 112

Dd 77

Ee 89

Kk 124

ID value (table3)

Bb 75

Cc 91

Dd 132

ID value (table4)

Aa 66

Cc 94

Ee 213

Gg 250

The output after joining should look like this:

ID value1 value2 value3 value4

Aa 22 34 0 66

Bb 28 0 75 0

Cc 32 112 91 94

Dd 50 77 132 0

Ee 0 89 0 213

Gg 0 0 0 250

Kk 0 124 0 0

My best effort:

#usr/bin/perl! use strict; #I opened all files (containg the tables) one by one; is #there a way +I can open all files at once? open(FILEH1, "<table1.txt"); while (my $file = <FILEH1>){ chomp $file; my @file1 = split('\t', $file); #to pick IDs and #values => $fil +e1[0] and $file1[1] } open(FILEH2, "<table2.txt"); #and continued to tables 3 and 4. #And then I tried to collect the items which is where I got #stuck.

Comment on I need help joining tab-delimited files/tables!
Download Code
Re: I need help joining tab-delimited files/tables!
by GrandFather (Cardinal) on Oct 19, 2011 at 22:12 UTC

    The trick here is to use a hash of arrays to store the table entries. Consider:

    use warnings; use strict; my $table1 = <<TABLE; ID value Aa 22 Bb 28 Cc 32 Dd 50 TABLE my $table2 = <<TABLE; ID value Aa 34 Cc 112 Dd 77 Ee 89 Kk 124 TABLE my $table3 = <<TABLE; ID value Bb 75 Cc 91 Dd 132 TABLE my $table4 = <<TABLE; ID value Aa 66 Cc 94 Ee 213 Gg 250 TABLE my %values; my $tableIndex = 0; for my $inFileVar (\$table1, \$table2, \$table3, \$table4) { open my $inFile, '<', $inFileVar or die "Can't open $inFileVar: $! +\n"; while (<$inFile>) { chomp; my ($id, $value) = split; next if $id eq 'ID'; $values{$id}[$tableIndex] = $value; } ++$tableIndex; } for my $id (sort keys %values) { print "$id"; print ' ', $values{$id}[$_] || 0 for 0 .. $tableIndex - 1; print "\n"; }

    Prints:

    Aa 22 34 0 66 Bb 28 0 75 0 Cc 32 112 91 94 Dd 50 77 132 0 Ee 0 89 0 213 Gg 0 0 0 250 Kk 0 124 0 0

    Note that the variables in the first for loop are used as in memory files so the open provides a file handle that uses the contents of each variable as the file contents. This technique is handy for avoiding extra files in sample code. Your real code can simply substitute a list of file names for the list of variable references.

    True laziness is hard work
      An alternative solution to the above would be something like this
      my @files = ('file1','file2','file3','file4'); # use a glob if you prefer my %hash; foreach my $file_name (@files){ open my $fh, '<',$file_name || die "$!"; while (<$fh>){ chomp; next if (/^\s*$/); my ($id,$val) = split /\t/; $hash{$id}{$file_name} = $val; } close $fh; } foreach my $id (sort keys %hash){ print "$id\t"; print $hash{$id}{$_} ? "$hash{$id}{$_}\t" : "0\t" foreach (@f +iles); print "\n"; }
        Your "open" statement has the "Operator precedence" problem described in die on file open.

                    "XML is like violence: if it doesn't solve your problem, use more."

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://932512]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (12)
As of 2014-07-14 08:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (257 votes), past polls