Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re: Split file into 4 smaller ones

by davido (Archbishop)
on Feb 07, 2013 at 22:32 UTC ( #1017731=note: print w/ replies, xml ) Need Help??


in reply to Split file into 4 smaller ones

use the -s operator to determine the file's size, and set the $/ special variable to equal 1/4th of the size returned by -s. Then read in a while() loop, just like you always would.

See perlvar for an explanation of how to set $/, and perlfunc -X to learn about -s

...or, again use -s, and then seek to specific locations and use sysread with a length of 1/4th of the total. But then beware of off-by-one errors.

Of course with any of these methods, you're going to run into some memory constraints; reading 25% of a 10GB file will consume 2.5GB. Might be better to set $/ to 1/16th of the file size, and then read/write four times for each output file.


Dave


Comment on Re: Split file into 4 smaller ones
Select or Download Code
Replies are listed 'Best First'.
Re^2: Split file into 4 smaller ones
by Anonymous Monk on Feb 08, 2013 at 08:43 UTC

    I always find it a bit silly when people suggest reading such large amounts of data into memory. Yes, I guess you can do that nowadays, but I'm not sure you always should. I prefer to conserve memory where possible.

    use POSIX 'ceil'; my $buffer_size = 64 * 1024; my $bytes_wanted = ceil($size / 4); sub copy { my ($in, $out, $bytes) = @_; my ($buffer, $bytes_read); $bytes_read = sysread($in, $buffer, $bytes); print $out $buffer; return $bytes_read; } open my $in, '<', $infn or die $!; for my $outfn (1..4) { open my $out, '>', $prefix . $outfn or die $!; my $bytes = 0; while ($bytes + $buffer_size < $bytes_wanted) { $bytes += copy($in, $out, $buffer_size); } copy($in, $out, $bytes_wanted - $bytes); close $out; } close $in;

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1017731]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2015-07-31 04:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (274 votes), past polls