Beefy Boxes and Bandwidth Generously Provided by pair Networks DiBona
No such thing as a small change
 
PerlMonks  

Re: processing huge files

by fauria (Deacon)
on Aug 02, 2005 at 09:11 UTC ( [id://480166]=note: print w/replies, xml ) Need Help??

This is an archived low-energy page for bots and other anonmyous visitors. Please sign up if you are a human and want to interact.


in reply to processing huge files

Well, if the file is that huge and cannot be handled, maybe using "divide and conquer" works:
use strict; my $file="filename"; my $file_size = 30720; #grab file size in MB. my $chunks = 1226; #How many pieces my $size = int $file_size / $chunks + 1; my $counter = 0; for(0..$chunks){ my $skip = $size * $counter; `dd if=$file of=$file.$counter bs=1M count=$size skip=$skip`; $counter++; }

And will output 1226 files called filename.*. Note that you will need an amount of disk space equal to the file size, and be sure when reading parts to open chunks just before finding the text separator (ie new line character) on each chunk, as it might be distributed bewteen one or more chunks. Also, be sure to close processed files! :D

Replies are listed 'Best First'.
Re^2: processing huge files
by jhourcle (Prior) on Aug 02, 2005 at 09:35 UTC

    If you're going to hand off the work to dd, you might want to use split, as it can act on full lines (so won't break in the middle of a record, given the logic the OP was using.).

    You also don't need to recursively call it, as the equivalent to your dd example would be:

    split -b 30720m -a 3 $INFILE
      I used dd because it can access directly a position in a file using skip, and then sequentialy read its content, without needing to load the whole file and then point to a location.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://480166]
help
Sections?
Information?
Find Nodes?
Leftovers?
    Notices?
    hippoepoptai's answer Re: how do I set a cookie and redirect was blessed by hippo!
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.