Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Merge Columns of Multiple files based on Multiple Common Column

by hdb (Monsignor)
on May 03, 2013 at 07:07 UTC ( [id://1031834]=note: print w/replies, xml ) Need Help??


in reply to Merge Columns of Multiple files based on Multiple Common Column

Storing all data in a hash, so this might break down once your files become huge. There is absolutely no error handling in there, so if the file format varies or the entries have spaces in them, it will not work anymore. So please take it as proof of concept and not as production code.

use strict; use warnings; sub readfile { my ( $filename, $hashref, $headref ) = @_; open my $fh, "<", $filename or die "Cannot open $filename!\n"; my $headers = <$fh>; my @h = split /\s/, $headers; $headref->{$_}++ for @h[3..$#h]; while( <$fh> ) { my @line = split /\s/; $hashref->{$line[0]}{$line[1]}{$line[2]}{$h[$_]} = $line[$_] f +or 3..$#h; } close $fh; } my %joined; my %headers; for my $file ( qw/file1.txt file2.txt file3.txt/ ) { readfile( $file, \%joined, \%headers ); } print "ID NAME date ", join( " ", sort keys %headers ), "\n"; for my $id ( sort keys %joined ) { for my $name ( sort keys %{$joined{$id}} ) { for my $date ( sort keys %{$joined{$id}{$name}} ) { print "$id $name $date "; print join " ", map { $joined{$id}{$name}{$date}{$_} // "- +-" } sort keys %headers; print "\n"; } } }
  • Comment on Re: Merge Columns of Multiple files based on Multiple Common Column
  • Download Code

Replies are listed 'Best First'.
Re^2: Merge Columns of Multiple files based on Multiple Common Column
by karlgoethebier (Abbot) on May 03, 2013 at 10:55 UTC
    «Storing all data in a hash, so this might break down once your files become huge.»

    Mmh, why not (storing all data in a hash)?

    I take a look at my box @home. It has 8 GBytes RAM. Sometime ago something like this was a high-end server, short time later a high-end work station and now it is a standard desktop solution:

    Karls-Mac-mini:monks karl$ top PhysMem: 833M wired, 1541M active, 260M inactive, 2634M used, 5558M fr +ee.

    So i don't have any pain in the ass to store my data in a big hash on my box.

    And if i have such huge files, i have another problem.

    Best regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1031834]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others cooling their heels in the Monastery: (8)
As of 2024-04-18 10:19 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found