Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Merge huge files (individually sorted) by order

by tanger007 (Initiate)
on Jul 18, 2013 at 23:55 UTC ( [id://1045229]=perlquestion: print w/replies, xml ) Need Help??

tanger007 has asked for the wisdom of the Perl Monks concerning the following question:

Been trying to merge a bunch of sorted files with the same ordering. The number of files is random, so I chose to use an array of file handles. The plan is to have each file send in one line, then, the line with the highest rank will get pushed out, and then another line is yanked out of the file where that last line was from... How does this work with perl? Been trying 3 hours with no success...
  • Comment on Merge huge files (individually sorted) by order

Replies are listed 'Best First'.
Re: Merge huge files (individually sorted) by order
by Loops (Curate) on Jul 18, 2013 at 23:57 UTC
      Works so well I felt more stupid :) A follow up question: if you have a big file (>10GB) in which one column has say 100 unique values. How do you break this file into 100 smaller files with one unique value in that column? Thanks so much.

        tanger007:

        Try something like putting a file handle for each column value in a hash, and then looking up the file handle on demand:

        my %OFH; my $OFH; while (<$IFH>) { my @fields = split /\t/,$_; $OFH = $OFH{$fields[$key_column]}; if (! defined $OFH) { # We don't have this value yet, so open another file open $OFH, '>', 'key_value.' . $fields[$key_column]; $OFH{$fields[$key_column]} = $OFH; } print $OFH join("\t",@fields); }

        Note: It's rough, untested and needs some error handling and such. But the basic concept should work fine for you.

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://1045229]
Approved by toolic
Front-paged by toolic
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (7)
As of 2024-04-23 18:26 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found