Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re^2: Merge huge files (individually sorted) by order

by tanger007 (Initiate)
on Jul 19, 2013 at 00:16 UTC ( #1045231=note: print w/ replies, xml ) Need Help??


in reply to Re: Merge huge files (individually sorted) by order
in thread Merge huge files (individually sorted) by order

Works so well I felt more stupid :) A follow up question: if you have a big file (>10GB) in which one column has say 100 unique values. How do you break this file into 100 smaller files with one unique value in that column? Thanks so much.


Comment on Re^2: Merge huge files (individually sorted) by order
Re^3: Merge huge files (individually sorted) by order
by roboticus (Canon) on Jul 19, 2013 at 01:02 UTC

    tanger007:

    Try something like putting a file handle for each column value in a hash, and then looking up the file handle on demand:

    my %OFH; my $OFH; while (<$IFH>) { my @fields = split /\t/,$_; $OFH = $OFH{$fields[$key_column]}; if (! defined $OFH) { # We don't have this value yet, so open another file open $OFH, '>', 'key_value.' . $fields[$key_column]; $OFH{$fields[$key_column]} = $OFH; } print $OFH join("\t",@fields); }

    Note: It's rough, untested and needs some error handling and such. But the basic concept should work fine for you.

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1045231]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (14)
As of 2014-12-18 08:38 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (47 votes), past polls