Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re^2: Merge huge files (individually sorted) by order

by tanger007 (Initiate)
on Jul 19, 2013 at 00:16 UTC ( #1045231=note: print w/ replies, xml ) Need Help??


in reply to Re: Merge huge files (individually sorted) by order
in thread Merge huge files (individually sorted) by order

Works so well I felt more stupid :) A follow up question: if you have a big file (>10GB) in which one column has say 100 unique values. How do you break this file into 100 smaller files with one unique value in that column? Thanks so much.
  • Comment on Re^2: Merge huge files (individually sorted) by order

Replies are listed 'Best First'.
Re^3: Merge huge files (individually sorted) by order
by roboticus (Chancellor) on Jul 19, 2013 at 01:02 UTC

    tanger007:

    Try something like putting a file handle for each column value in a hash, and then looking up the file handle on demand:

    my %OFH; my $OFH; while (<$IFH>) { my @fields = split /\t/,$_; $OFH = $OFH{$fields[$key_column]}; if (! defined $OFH) { # We don't have this value yet, so open another file open $OFH, '>', 'key_value.' . $fields[$key_column]; $OFH{$fields[$key_column]} = $OFH; } print $OFH join("\t",@fields); }

    Note: It's rough, untested and needs some error handling and such. But the basic concept should work fine for you.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1045231]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (4)
As of 2016-08-28 11:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The best thing I ever won in a lottery was:















    Results (392 votes). Check out past polls.