Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

...or use mine:)

This is one of those cases where rolling your own has benefits. The FAQ, Cookbook and the List::Util version of the Fischer-Yates shuffle all use the copy semantics. This means that you need over double the space required to store the data, in order to shuffle it.

My version does an in-place shuffle, the benefits of which really show up when you start shuffling huge arrays. The following results are a comparision between my pure-perl, inplace shuffle and the List::Util XS version.

P:\test>test -N=2000000 Pre-allocation 2696 kb (Memory use noted manually) Pre-inplace shuffle 43068 kb Post-inplace shuffle 43076 kb Post-XS_copy shuffle 91192 kb 1 trial of Inplace (41.760s total) 1 trial of Copied (48.400s total)

The results show that my in-place version consumes just 8k extra ram to perform the shuffle, and takes about 15% less time to do it than the XS version. The XS version only takes around 15 seconds to actually perform the shuffle, but the copy semantics mean it loses this performance advantage by the need to allocate double the space, ending up considerably slower.

It wouldn't be that hard (if you are an an accomplished XS programmer) to re-cast the List::Util version to detect that it was being given an array reference and was being called in a void context and switch to an in-place algorithm. Some crude tests seem to show that this would not only halve the memory usage, but as a result, would cut the overall shuffle time to less than a third.

The benchmark (You'll need to use an external tool to measure the memory usage).

#! perl -slw use strict; use List::Util qw[ shuffle ]; use Benchmark::Timer; our $N ||= 1_000_000; sub my_shuffle (\@) { my( $aref, $x ) = shift; for my $y ( 0 .. $#{ $aref } ) { $x = $y + rand( @{ $aref } - $y ); @$aref[ $y, $x ] = @$aref[ $x, $y ]; } } my $timer = new Benchmark::Timer; my @array; print 'Pre-allocation'; <STDIN>; push @array, $_ for 1 .. $N; print 'Pre-inplace shuffle'; <STDIN>; $timer->start('Inplace'); my_shuffle @array; $timer->stop('Inplace'); print 'Post-inplace shuffle'; <STDIN>; $timer->start('Copied'); my @shuffled = shuffle @array; $timer->stop('Copied'); print 'Post-XS_copy shuffle'; <STDIN>; $timer->report; __END__ P:\test>test -N=2000000 Pre-allocation Pre-inplace shuffle Post-inplace shuffle Post-XS_copy shuffle 1 trial of Inplace (41.760s total) 1 trial of Copied (48.400s total)

Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"Think for yourself!" - Abigail

In reply to Re: Randomize lines with limited memory (Roll your own...) by BrowserUk
in thread Randomize lines with limited memory by natch

Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and all is quiet...

    How do I use this? | Other CB clients
    Other Users?
    Others examining the Monastery: (3)
    As of 2018-05-28 02:30 GMT
    Find Nodes?
      Voting Booth?