Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: When the Best Solution Isn't

by thraxil (Prior)
on Sep 23, 2002 at 03:44 UTC ( #199996=note: print w/replies, xml ) Need Help??


in reply to When the Best Solution Isn't

interesting. the incorrect solution that your toolsmith provided has inspired me to come up with a variation which i think is correct:

my @shuffled = map {$_->[0]} sort { $a->[1] <=> $b->[1]} map {[$_, rand(1)]} @array;

kind of a 'randomized schwartzian transform' approach.

it benchmarks slightly faster than the standard Fisher-Yates shuffle but not quite as fast as the broken version. mine and Fisher-Yates are both going to be linear with the size of the array and you really can't do any better than that and still have a correct solution.

i'm honestly baffled as to why the broken one runs so quickly. my best guess is that getting different values on each comparison in some cases confuses qsort to the point where it thinks it's done before it really is. alternatively though it seems like if you get the wrong series of random numbers, it could make qsort take even longer than normal.

anders pearson

Replies are listed 'Best First'.
Re^2: When the Best Solution Isn't
by Aristotle (Chancellor) on Sep 23, 2002 at 04:59 UTC
    Caveat: both our solutions take memory corresponding to the size of the input array.
    #!/usr/bin/perl -w use strict; use Benchmark qw(cmpthese); sub xform { map {$_->[0]} sort { $a->[1] <=> $b->[1]} map {[$_, rand(1)]} @_; } sub slice { my @random; push @random, rand 1 for 0 .. $#_; @_[ sort { $random[$a] <=> $random[$b] } 0 .. $#_ ]; } my @array = 1 .. 1000; cmpthese(1000, { slice => sub { slice @array }, xform => sub { xform @array }, }); __END__ Benchmark: timing 1000 iterations of slice, xform... slice: 17 wallclock secs (16.96 usr + 0.08 sys = 17.04 CPU) @ 58 +.69/s (n=1000) xform: 28 wallclock secs (27.76 usr + 0.12 sys = 27.88 CPU) @ 35 +.87/s (n=1000) Rate xform slice xform 35.9/s -- -39% slice 58.7/s 64% --

    Makeshifts last the longest.

      A slightly faster solution (on my machine) uses in-place assignment to @_:
      sub shuffle { my @r = map rand, 0 .. $#_; @_[sort { $r[$a] <=> $r[$b] } 0 .. $#_] = @_; }

      _____________________________________________________
      Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
      s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

        There was a recent thread on the ActiveState list which was never quite solved. In general it came down to assigning to @_ was causing some weirdness to occur and ActiveState perl 5.6.1 would slow down to a crawl after a few thousand iterations. Other perls like my OpenBSD 5.6.1 and Cygwin 5.8.0 behaved nicely.

        I'm not sure what the issue was but there may be lurking bug in assigning to @_. I'm warning against it. (or at least be aware so if you do run into trouble later...)

        Update: fixed some spelling

        You know (leaving the in-place assignment issue aside), it occured to me (duh) that this is a general purpose but faster rephrasing of the Schwartzian Transform:
        sub sxsorta (&;) { my $sort_key = shift; my @rank = map &$sort_key, @_; @_[ sort { $rank[$a] cmp $rank[$b] } 0 .. $#_ ]; } sub sxsortn (&;) { my $sort_key = shift; my @rank = map &$sort_key, @_; @_[ sort { $rank[$a] <=> $rank[$b] } 0 .. $#_ ]; }

        The gains over the regular ST comes from the fact that it doesn't create hundreds of tiny anonymous arrays, and that the sort callback is slightly faster because it only has to do an array lookup but no dereferencing.

        The drawback is that with a lexically scoped (as it should be) @rank, there's no way to pass a userspecified callback to the routines without inflicting a massive performance hit. To do more complex sorts, one therefor has to separately spell this idiom out in their own code.

        Makeshifts last the longest.

Re: Re: When the Best Solution Isn't
by thraxil (Prior) on Sep 23, 2002 at 14:28 UTC

    actually, something occurred to me in the shower this morning. my approach isn't linear, it should be O(NlogN) because of the sort. but Fisher-Yates is linear, so it should pretty much beat anything else, at least for large inputs. but it doesn't seem to. now i'm really confused.

    anders pearson

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://199996]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (6)
As of 2020-01-18 20:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?