http://www.perlmonks.org?node_id=1017731


in reply to Split file into 4 smaller ones

use the -s operator to determine the file's size, and set the $/ special variable to equal 1/4th of the size returned by -s. Then read in a while() loop, just like you always would.

See perlvar for an explanation of how to set $/, and perlfunc -X to learn about -s

...or, again use -s, and then seek to specific locations and use sysread with a length of 1/4th of the total. But then beware of off-by-one errors.

Of course with any of these methods, you're going to run into some memory constraints; reading 25% of a 10GB file will consume 2.5GB. Might be better to set $/ to 1/16th of the file size, and then read/write four times for each output file.


Dave

Replies are listed 'Best First'.
Re^2: Split file into 4 smaller ones
by Anonymous Monk on Feb 08, 2013 at 08:43 UTC

    I always find it a bit silly when people suggest reading such large amounts of data into memory. Yes, I guess you can do that nowadays, but I'm not sure you always should. I prefer to conserve memory where possible.

    use POSIX 'ceil'; my $buffer_size = 64 * 1024; my $bytes_wanted = ceil($size / 4); sub copy { my ($in, $out, $bytes) = @_; my ($buffer, $bytes_read); $bytes_read = sysread($in, $buffer, $bytes); print $out $buffer; return $bytes_read; } open my $in, '<', $infn or die $!; for my $outfn (1..4) { open my $out, '>', $prefix . $outfn or die $!; my $bytes = 0; while ($bytes + $buffer_size < $bytes_wanted) { $bytes += copy($in, $out, $buffer_size); } copy($in, $out, $bytes_wanted - $bytes); close $out; } close $in;