http://www.perlmonks.org?node_id=470234


in reply to Handling A File In Human-Compatible Chunks

Here you are:
use ChunkReader; # create reader, threshold is optional (def. 2048) my $reader = new ChunkReader (threshold=>1024); # open file $reader->open ("c:/test2.txt"); #iterate while ( my $chunk = $reader->chunk ) { print "$chunk\n****************\n"; }
Put this in a module "ChunkReader.pm"
package ChunkReader; sub new { my $class = shift; my %args = @_; $args{threshold} = 2048 unless defined $args{threshold}; return bless {%args}, $class; } sub open { my $self = shift; my $file = shift || $self->{file}; my $handle; open $handle, "<", $file or die "cannot open '$self->{file}\n'"; $self->{handle} = $handle; } sub chunk { my $self = shift; my $handle = $self->{handle} || die "File not open!\n"; my $chunk = $self->{lastline}; # reset last line $self->{lastline} = undef; while ( my $line = <$handle> ) { if ( ( length($chunk) + length($line) ) > $self->{threshold} ) + { # unless we already read a chunk # (when a single line is bigger than the threshold) unless ( $chunk ) { #return the line return $line; } else { # save the last line for further use # and return the chunk $self->{lastline} = $line; return $chunk; } } else { # append line and keep going $chunk .= $line; } } #end of file return $chunk; } 1;


Update:
Taken the advice from TheDamian at sub that sets $_, I updated the module so you can now do
use ChunkReader; # create reader, threshold is optional (def. 2048) my $reader = new ChunkReader (); # open file $reader->open ("c:/test2.txt"); while ( $reader->chunk ) { print "$_****************\n"; }

package ChunkReader; sub new { my $class = shift; my %args = @_; $args{threshold} = 2048 unless defined $args{threshold}; return bless {%args}, $class; } sub open { my $self = shift; my $file = shift || $self->{file}; my $handle; open $handle, "<", $file or die "cannot open '$self->{file}\n'"; $self->{handle} = $handle; } sub chunk { my $self = shift; my $handle = $self->{handle} || die "File not open!\n"; my $chunk = $self->{lastline}; # reset last line $self->{lastline} = undef; while ( my $line = <$handle> ) { if ( ( length($chunk) + length($line) ) > $self->{threshold} ) + { # unless we already read a chunk # (when a single line is bigger than the threshold) unless ( $chunk ) { #return the line return bless \$line, ChunkReader::Chunk; } else { # save the last line for further use # and return the chunk $self->{lastline} = $line; return bless \$chunk, ChunkReader::Chunk; } } else { # append line and keep going $chunk .= $line; } } #end of file return bless \$chunk, ChunkReader::Chunk; } package ChunkReader::Chunk; use overload ( q{bool} => sub { return ${$_[0]} }, q{""} => sub { return ${$_[0]} }, q{0+} => sub { 0 + ${$_[0]} }, fallback => 1, ); sub DESTROY { $_ = ${$_[0]}; } 1; __END__


holli, /regexed monk/

Replies are listed 'Best First'.
Re^2: Handling A File In Human-Compatible Chunks
by Cody Pendant (Prior) on Jun 27, 2005 at 13:35 UTC
    Thanks everyone for you contributions. I don't know if I confused everyone with the use of the word "paragraph", that wasn't my main concern as much as the sub.

    Holli's contribution looks like exactly what I wanted.



    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print
      Well, all I did was taking your code and adding some OO-sugar around it.


      holli, /regexed monk/
        Very nice sugar, thank you. Posting to say, I got extra-lazy and decided I wanted it even easier.

        Holli's code has two methods, one to open the file and the rest to return it in chunks.

        $reader->open ("/path/to/file.txt"); #iterate while ( my $chunk = $reader->chunk ) { print "$chunk\n****************\n"; }
        Whereas I'm so lazy I wanted to do just this:
        while ( my $chunk = $reader->chunk("/path/to/file.txt") ) { print "$chunk\n****************\n"; }

        So I moved some code around and got this, which works just fine:

        package ChunkReader; sub new { my $class = shift; my %args = @_; $args{threshold} = 2048 unless defined $args{threshold}; return bless {%args}, $class; } sub chunk { my $self = shift; unless ( $self->{handle} ) { my $file = shift || $self->{file}; print "Filename is $file\n"; my $handle; open($handle, "<", $file) or die "cannot open '$self->{file}\n'"; $self->{handle} = $handle; } my $handle = $self->{handle} || die "File not open!\n"; my $chunk = $self->{lastline}; # reset last line $self->{lastline} = ''; while ( $line = <$handle> ) { if ( ( length( $chunk ) + length( $line ) ) > $self->{threshold} ) { # unless we already read a chunk # (when a single line is bigger than the threshold) unless ( $chunk ) { #return the line return $line; } else { # save the last line for further use # and return the chunk $self->{lastline} = $line; return $chunk; } } else { # append line and keep going $chunk .= $line; } } #end of file return $chunk; } 1;


        ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
        =~y~b-v~a-z~s; print