Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: Strategy for randomizing large files via sysseek

by TilRMan (Friar)
on Sep 10, 2004 at 03:39 UTC ( #389952=note: print w/ replies, xml ) Need Help??


in reply to Strategy for randomizing large files via sysseek

Here's my divide-and-conquer approach: Split incoming lines randomly among 50 tempfiles. Recurse on the tempfiles. When the tempfiles are smaller than 1000 lines, shuffle in memory. Spit out the shuffled tempfiles sequentially.

Presumably, adjusting the knobs (higher) to match your system will improve performance.

I believe that this is a fair randomness (assuming fairness of shuffle() and rand), uses constant memory, runs in O(n log n) time, and uses "O(n)" disk. (I may be wrong about these -- comments welcome.) The big advantage is that there is no random seeking, so you get the benefit of cachesbuffers.

#!/usr/bin/perl use strict; use warnings; # Shuffle unlimited number of lines from ARGV to STDOUT # using tempfiles my $SPREAD = 50; my $TERMINATE = 1000; use File::Temp qw( tempfile ); use IO::File; use List::Util qw( shuffle ); sub shuffle_lines { my ( $infh, $outfh ) = @_; my @temp = map { scalar tempfile } 1 .. $SPREAD; my @count = (0) x $SPREAD; while (<$infh>) { my $i = int rand $SPREAD; $temp[$i]->print($_); $count[$i]++; } for my $i ( 0 .. $#count ) { my $fh = $temp[$i]; seek $fh, 0, 0; if ( $count[$i] <= $TERMINATE ) { print $outfh shuffle <$fh>; } else { shuffle_lines( $fh, $outfh ); } } } shuffle_lines( \*ARGV, \*STDOUT );


Comment on Re: Strategy for randomizing large files via sysseek
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://389952]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2014-08-21 03:57 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (127 votes), past polls