Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Re: Large file split into smaller using sysread()

by kejohm (Hermit)
on Mar 28, 2012 at 02:10 UTC ( #962048=note: print w/replies, xml ) Need Help??

in reply to Large file split into smaller using sysread()

If I am understanding your question correctly, you want to split a 3GB text file into 200MB parts, but some lines are being split between files.

The functions read() and sysread() operate in characters and don't have a concept of lines. So, if you read exactly 200MB of data, there is a high probability that the boundary will be in the middle of a line.

One way to do it would be to read in lines, instead of characters, from the big file and print them to a new file part, keeping track of the number of characters read so far. When the character count goes over 200MB, you close the current file part and open the next one. Here is an example:

#!perl # Untested use 5.012; my $partsize = 200 * 1024 * 1024; my $file = shift or die 'no file'; open my $in, '<', $file or die "Can't open '$file' for reading: $!"; my $part = 1; my $size = 0; open my $out, '>', "$file.part$part" or die "Can't open '$file.part$part' for writing: $!"; while (<$in>) { print $out $_; $size += length $_; if ( $size >= $partsize ) { close $out; $part++; open $out, '>', "$file.part$part" or die "Can't open '$file.part$part' for writing: $!"; $size = 0; } }

Replies are listed 'Best First'.
Re^2: Large file split into smaller using sysread()
by rkshyam (Acolyte) on Mar 30, 2012 at 08:06 UTC

    Hi kejohm, This great solution worked very well for me. Thanks a lot !!!

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://962048]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (4)
As of 2018-06-23 05:32 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (125 votes). Check out past polls.