Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Re: Large file split into smaller using sysread()

by kejohm (Hermit)
on Mar 28, 2012 at 02:10 UTC ( #962048=note: print w/ replies, xml ) Need Help??


in reply to Large file split into smaller using sysread()

If I am understanding your question correctly, you want to split a 3GB text file into 200MB parts, but some lines are being split between files.

The functions read() and sysread() operate in characters and don't have a concept of lines. So, if you read exactly 200MB of data, there is a high probability that the boundary will be in the middle of a line.

One way to do it would be to read in lines, instead of characters, from the big file and print them to a new file part, keeping track of the number of characters read so far. When the character count goes over 200MB, you close the current file part and open the next one. Here is an example:

#!perl # Untested use 5.012; my $partsize = 200 * 1024 * 1024; my $file = shift or die 'no file'; open my $in, '<', $file or die "Can't open '$file' for reading: $!"; my $part = 1; my $size = 0; open my $out, '>', "$file.part$part" or die "Can't open '$file.part$part' for writing: $!"; while (<$in>) { print $out $_; $size += length $_; if ( $size >= $partsize ) { close $out; $part++; open $out, '>', "$file.part$part" or die "Can't open '$file.part$part' for writing: $!"; $size = 0; } }


Comment on Re: Large file split into smaller using sysread()
Select or Download Code
Re^2: Large file split into smaller using sysread()
by rkshyam (Acolyte) on Mar 30, 2012 at 08:06 UTC

    Hi kejohm, This great solution worked very well for me. Thanks a lot !!!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://962048]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2014-12-19 01:58 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (70 votes), past polls