Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Comment on

( #3333=superdoc: print w/ replies, xml ) Need Help??
blazar++. Just checking if anyone is still paying attention :)

Still paying attention and also advertising in clpmisc (link @ GG). So BrowserUk++ for the additional insight and the many details provided in this reply of yours.

Since bukNew() doesn't need to copy the input list--because it shuffles, and therefore mutates, a list of indices generated internally, rather than mutating a copy of the input--it seemes silly to pass arrays in by value forcing their duplication. Most uses of shuffle() are applied to pre-existing arrays. That's why we make a copy of them, or create a list of aliases to them in most versions--simply to avoid the shuffle from modifying the external arrays.

Well, but then that is somewhat obvious, and even the naive implementation would would be better apt at shuffling an array in place, and would thus better be rewritten like this:

sub naive (@) { for (reverse 1..$#_) { my $r=int rand($_+1); @_[$_,$r]=@_[$r,$_]; } }

(Not that this would make it really faster - I checked and it's still considerably slower than the alternatives presented here.)

But then I think that in any case you would be comparing apples and bananas: for it was somewhat clear that the interface was in terms of "accept a list, return a (shuffled) list"... unless the interface itself was also part of the choice to take. OTOH your tests themselves show that blokhead's solution performs best on large datasets, which is exactly were speed is more likely to matter (well, not strictly, since one may have 10^6 shuffles of 10 elements long lists to do...) - and it has a more intuitive interface, so I would go for it. More precisely I have "decided" that my favourite version is the following modification of blokhead() that can be considered a "hybrid" with another version of yours:

sub bzr (@) { my @a = 0..$#_; my $i = @_; my $n; map +($_[$a[$n=rand($i--)]], $a[$n]=$a[$i])[0], @_; }

I must say that I've included it in the benchmarks too and while it doesn't perform well compared to blokhead(), buk() and bukNew() in the short list tests, it is comparable with the best on the 1000 strings one, and in two of them it even performed best of all, although I'd rather consider this to be a fluctuation. Why this is so, anyway, is beyond my comprehension.

In the meanwhile, I received a followup from Peter J. Holzer in clpmisc, which addresses my own post (a copy of the root node here) and I'm also reporting herafter in its entirey for completeness.


Peter J. Holzer's reply

The main performance improvement comes from avoiding hash slices. Just replacing @l[$_,$r]=@l[$r,$_]; in your code with a conventional swap via a temporary variable:

my $x = $l[$r]; $l[$r] = $l[$_]; $l[$_] = $x;

gives me a 10 % performance boost.

The rest of the improvement comes from not swapping in-place at all, but building a new array from scratch:

sub naive2 (@) { my @l=@_; my @l2; for (reverse 0..$#l) { my $r=int rand($_+1); push @l2, $l[$r]; $l[$r] = $l[$_]; } @l2; }

is just as fast as the map (and it does the same thing, really).

Fascinating. I would have expected both changes to make the performance worse, not better. Just shows again that intuition is an unreliable guide on perl performance.


In reply to Re^6: About List::Util's pure Perl shuffle() by blazar
in thread About List::Util's pure Perl shuffle() by blazar

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others rifling through the Monastery: (7)
    As of 2014-12-25 02:45 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (159 votes), past polls