Re: mem usage

in reply to mem usage

If you don't mind reading the source file twice and a whole lot of random seeks the second time through, you can generate a list of offsets, shuffle those and then loop through a seek, read, write cycle:

#!/usr/bin/perl

use strict;
use warnings;
use List::Util qw(shuffle);

my @offsets;

print STDERR "Scanning...";

open(IN, $ARGV[0]);
do { push @offsets, tell(IN) } while (<IN>);
close(IN);
pop @offsets;

print STDERR "Done. ($#offsets)\nScrambling...";

@offsets = shuffle(@offsets);

print STDERR "Done.\nWriting scrambled...";

open(IN, $ARGV[0]);
for (@offsets) {
    seek(IN, $_, 0);
    my $line = <IN>;
    $line .= $/ if $line !~ qr{$/};
    print $line;
}

print STDERR "Done.\n";
[download]

This scrambled an 800k line/60M file I had handy in eight seconds with minimal memory usage in process space. I assume the kernel kept the entire file cached in memory.

In Section Seekers of Perl Wisdom