Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: mem usage

by crag (Novice)
on May 26, 2010 at 22:13 UTC ( #841833=note: print w/ replies, xml ) Need Help??


in reply to mem usage

If you don't mind reading the source file twice and a whole lot of random seeks the second time through, you can generate a list of offsets, shuffle those and then loop through a seek, read, write cycle:

#!/usr/bin/perl use strict; use warnings; use List::Util qw(shuffle); my @offsets; print STDERR "Scanning..."; open(IN, $ARGV[0]); do { push @offsets, tell(IN) } while (<IN>); close(IN); pop @offsets; print STDERR "Done. ($#offsets)\nScrambling..."; @offsets = shuffle(@offsets); print STDERR "Done.\nWriting scrambled..."; open(IN, $ARGV[0]); for (@offsets) { seek(IN, $_, 0); my $line = <IN>; $line .= $/ if $line !~ qr{$/}; print $line; } print STDERR "Done.\n";
This scrambled an 800k line/60M file I had handy in eight seconds with minimal memory usage in process space. I assume the kernel kept the entire file cached in memory.


Comment on Re: mem usage
Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://841833]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (13)
As of 2015-07-02 13:07 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (38 votes), past polls