Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation

Re^2: Merge huge files (individually sorted) by order

by tanger007 (Initiate)
on Jul 19, 2013 at 00:16 UTC ( #1045231=note: print w/replies, xml ) Need Help??

in reply to Re: Merge huge files (individually sorted) by order
in thread Merge huge files (individually sorted) by order

Works so well I felt more stupid :) A follow up question: if you have a big file (>10GB) in which one column has say 100 unique values. How do you break this file into 100 smaller files with one unique value in that column? Thanks so much.
  • Comment on Re^2: Merge huge files (individually sorted) by order

Replies are listed 'Best First'.
Re^3: Merge huge files (individually sorted) by order
by roboticus (Chancellor) on Jul 19, 2013 at 01:02 UTC


    Try something like putting a file handle for each column value in a hash, and then looking up the file handle on demand:

    my %OFH; my $OFH; while (<$IFH>) { my @fields = split /\t/,$_; $OFH = $OFH{$fields[$key_column]}; if (! defined $OFH) { # We don't have this value yet, so open another file open $OFH, '>', 'key_value.' . $fields[$key_column]; $OFH{$fields[$key_column]} = $OFH; } print $OFH join("\t",@fields); }

    Note: It's rough, untested and needs some error handling and such. But the basic concept should work fine for you.


    When your only tool is a hammer, all problems look like your thumb.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1045231]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (6)
As of 2018-07-23 11:58 GMT
Find Nodes?
    Voting Booth?
    It has been suggested to rename Perl 6 in order to boost its marketing potential. Which name would you prefer?

    Results (465 votes). Check out past polls.