Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Split file into 4 smaller ones

by davido (Archbishop)
on Feb 07, 2013 at 22:32 UTC ( #1017731=note: print w/ replies, xml ) Need Help??


in reply to Split file into 4 smaller ones

use the -s operator to determine the file's size, and set the $/ special variable to equal 1/4th of the size returned by -s. Then read in a while() loop, just like you always would.

See perlvar for an explanation of how to set $/, and perlfunc -X to learn about -s

...or, again use -s, and then seek to specific locations and use sysread with a length of 1/4th of the total. But then beware of off-by-one errors.

Of course with any of these methods, you're going to run into some memory constraints; reading 25% of a 10GB file will consume 2.5GB. Might be better to set $/ to 1/16th of the file size, and then read/write four times for each output file.


Dave


Comment on Re: Split file into 4 smaller ones
Select or Download Code
Re^2: Split file into 4 smaller ones
by Anonymous Monk on Feb 08, 2013 at 08:43 UTC

    I always find it a bit silly when people suggest reading such large amounts of data into memory. Yes, I guess you can do that nowadays, but I'm not sure you always should. I prefer to conserve memory where possible.

    use POSIX 'ceil'; my $buffer_size = 64 * 1024; my $bytes_wanted = ceil($size / 4); sub copy { my ($in, $out, $bytes) = @_; my ($buffer, $bytes_read); $bytes_read = sysread($in, $buffer, $bytes); print $out $buffer; return $bytes_read; } open my $in, '<', $infn or die $!; for my $outfn (1..4) { open my $out, '>', $prefix . $outfn or die $!; my $bytes = 0; while ($bytes + $buffer_size < $bytes_wanted) { $bytes += copy($in, $out, $buffer_size); } copy($in, $out, $bytes_wanted - $bytes); close $out; } close $in;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1017731]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (8)
As of 2014-11-24 07:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (137 votes), past polls