Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Algorithm RFC: fast (pseudo-)random shuffle with no repetition

by Anonymous Monk
on Sep 22, 2023 at 21:20 UTC ( #11154597=perlquestion: print w/replies, xml ) Need Help??

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

The was the SO question recently, and as it sometimes happens, when I think "oh, this can be fun to play with, for better algorithm", it brewed for itself somewhere in subconscious, until the eureka moment a couple days ago (I'm in no hurry :-)): "Of course! Be greedy, demand twice as needed!"

There was advice to use brute force, at least as accepted answer in linked question (from 2021) there. Not sure if I got it right, I don't read Ruby, and didn't try other answers. Both brute subroutines below aren't actually used: they are totally unusable for lists with ~15 unique strings or more, plus any decent amount of duplicates. I wrote 2nd one because I couldn't believe it at first (accepted solution??). They are left just in case anyone wants to try (or point at my mistakes in implementation?), and can be ignored.

Back to answers at SO, there are 2 Perl solutions. One (a) doesn't compile; (b) if fixed, emits a warning for un-initialized value; (c) if fixed (or ignored), it seems to work OK. But for (corner-case, of course) input of (b,a,a), it gives answer (b,a). I didn't look further.

Another solution (by esteemed monk) fails randomly for e.g. corner-case (a,a,a,b,b) -- the only answer can be (a,b,a,b,a), of course. Why does it fail? Output list is initialized to e.g. (a,b). If 1st key to iterate is "a", then one of 2 remaining "a"'s is added to give (a,b,a) with no place for 2nd remaining "a". So, easy fix would be kind of "breadth-first" hash consumption. I'm sorry if code I had to add looks ugly to the author.

This fixed version will serve as reference to compare my solution to, it generates truly random lists.

With algorithm I suggest -- ask for twice as many random indexes from remaining pool, then simply reject (comb out) half of them. It guarantees there will be no consecutive dupes (and of course doesn't mean "only odd or even indexes for this value").

One obvious compromise on randomness will be "dupes are never placed at both head and tail" -- except corner-cases such as 'aba' or 'abaca', of course. There are actually 3 cases, depending on size of remaining pool. Cases "2" and "3" restrict randomness further. E.g., for 'aaaabbbcc', the 'c' is never placed at indexes 0 or 1 -- unlike the "reference SO implementation with true randomness".

However, lines with "die" in them can be un-commented (and they were un-commented during benchmarking) if input is not an artificial corner-case -- this code is never reached with realistic data. I mean, other than corner-cases and head/tail restriction, my algorithm seems to produce random enough result.

(In fact, one of "requests" of "RFC" is how to estimate randomness (entropy) for multiple runs of subroutine. Didn't look into that yet.)

Further "requests" are: can it be improved? Both List::MoreUtils::samples and e.g. (unused) Math::Prime::Util::randperm return their result shuffled, which I don't need and have to sort back to order! And more, e.g. samples takes random samples and therefore should know which items were unselected, but I have no better way to find out "which" except with more work using singleton. It feels like huge amount of unnecessary work I do (though it's still much faster than "SO reference solution"). Or maybe, perhaps, someone would suggest even faster solution?

(+ I understand there's sloppiness on my side in e.g. $uniq variable name doesn't actually mean number of unique items which fake_data returns. I hope this (and similar) can be forgiven.)

use strict; use warnings; use feature 'say'; use List::Util qw/ shuffle /; use List::MoreUtils qw/ part samples singleton /; use ntheory qw/ forperm lastfor /; use Algorithm::Combinatorics qw/ permutations /; use Benchmark 'cmpthese'; my @input = shuffle( qw( a a a a b b b c c )); # corner-cases @input = shuffle( qw( a a a b b )); # srand 123; @input = fake_data( 555, 55 ); #say scalar @input; # 2096 sub fake_data { my ( $uniq, $pivot ) = @_; my @tmp = map { sprintf '< %06d >', rand 1e9 } 0 ... $uniq; my @out; push @out, @tmp[ 0 .. $_ ] for 0 .. $pivot; @out = shuffle( @out, @tmp[ $pivot + 1 .. $uniq ]); return @out } cmpthese 10, { SO_fixed => sub { die unless SO_fixed( \@input )}, my_shuffle => sub { die unless my_shuffle( \@input )}, }; sub brute { my $input_ref = shift; my @output; forperm { my $prev = ''; for ( @_ ) { return if $prev eq $input_ref-> [ $_ ]; $prev = $input_ref-> [ $_ ] } @output = @{ $input_ref }[ @_ ]; lastfor, return } @$input_ref; return \@output } sub brute2 { my $input_ref = shift; my @output; my $iter = permutations( $input_ref ); PERM: while ( my $p = $iter-> next ) { my $prev = ''; for ( @$p ) { next PERM if $prev eq $_; $prev = $_ } @output = @$p; last PERM } return \@output } sub SO_fixed { my $input_ref = shift; my %counts; ++$counts{ $_ } for @$input_ref; my @strings = shuffle keys %counts; LOOP: { my $any = 0; for my $string ( keys( %counts ) ) { next if $counts{ $string } == 1; $counts{ $string } --; $any = 1; my @safe = grep { $_ == 0 || $strings[ $_ - 1 ] ne $string + } grep { $_ == @strings || $strings[ $_ ] ne $string + } 0 .. @strings; return undef unless @safe; my $pick = $safe[ rand( @safe ) ]; splice( @strings, $pick, 0, $string ); } redo LOOP if $any } return \@strings } sub my_shuffle { my $input_ref = shift; my @output; my %counts; $counts{ $_ } ++ for @$input_ref; my ( $single, $multi ) = part { $counts{ $_ } > 1 } keys %counts; my @multi = sort { $counts{ $b } <=> $counts{ $a }} @$multi; my @pool = ( 0 .. $#$input_ref ); for my $str ( @multi ) { my $count = $counts{ $str }; my @take; if ( $count <= @pool / 2 ) { # case 1 my @excess = sort { $a <=> $b } samples( 2 * $count, @pool + ); my $n = int rand 2; my @idx = grep { $n ^ $_ % 2 } 0 .. $#excess; @take = @excess[ @idx ]; } elsif ( 2 * $count - 1 == @pool ) { # case 2 #die 'This code is unreachable for realistic input'; my @idx = grep { not $_ % 2 } 0 .. $#pool; @take = @pool[ @idx ]; } else { # case 3 #die 'This code is unreachable for realistic input'; my $prev = -2; my @ok = grep { my $res = $_ - $prev; $prev = $_; $res > 1 } @pool; return undef if $count > @ok; @take = samples( $count, @ok ); } @pool = singleton @pool, @take; @output[ $_ ] = $str for @take; } @output[ @pool ] = @$single if @pool; return \@output; } __END__ (warning: too few iterations for a reliable count) Rate SO_fixed my_shuffle SO_fixed 2.29/s -- -95% my_shuffle 42.7/s 1763% --

Replies are listed 'Best First'.
Re: Algorithm RFC: fast (pseudo-)random shuffle with no repetition
by tybalt89 (Monsignor) on Sep 23, 2023 at 04:47 UTC

    Second pass at it - Now for something completely different...

    #!/usr/bin/perl use strict; # https://www.perlmonks.org/?node_id=11154597 use warnings; use List::AllUtils qw( first shuffle ); my @in = shuffle <DATA>; my @out = shift @in; LOOP: while( @in ) { for my $in ( 0 .. $#in ) { if( $in[$in] ne $out[0] ) # put at start { unshift @out, splice @in, $in, 1; next LOOP; } elsif( $in[$in] ne $out[-1] ) # put at end { push @out, splice @in, $in, 1; next LOOP; } else # put in first middle place it fits { if( my $pick = first { $out[$_ - 1] ne $in[$in] and $out[$_] ne $in[$in] } 1 .. $#out + ) { splice @out, $pick, 0, splice @in, $in, 1; next LOOP; } } } die "FAILED\nin\n @in\nout\n @out"; } print @out; __DATA__ Line 1 Line 2 Line 3 Line 3 Line 3 Line 4 Line 8

    BTW: It only found 240 different orders in 1e7 runs.

      Either golfed or just shorter, your choice :)

      #!/usr/bin/perl use strict; # https://www.perlmonks.org/?node_id=11154597 use warnings; use List::AllUtils qw( first shuffle ); my @in = shuffle <DATA>; my @out = shift @in; LOOP: while( @in ) { for my $in ( 0 .. $#in ) { if( defined( my $place = first { $_ == 0 ? $in[$in] ne $out[0] : # at start? $_ == @out ? $in[$in] ne $out[-1] : # at end? $out[$_-1] ne $in[$in] && $out[$_] ne $in[$in] # in middle? } 0 .. @out ) ) { splice @out, $place, 0, splice @in, $in, 1; next LOOP; } } die "FAILED\nin\n @in\nout\n @out"; } print @out; __DATA__ Line 1 Line 2 Line 3 Line 3 Line 3 Line 4 Line 8
      > It only found 240 different orders in 1e7 runs.

      yes, there are only 240 = 10 * 4! distinct solutions for this one. qw/1 2 3 3 3 4 8/

      EDIT

      There are 24= 4! possible permutations for lines 1,2,4 and 8

      There are 10 possibilities to partition them into 4 groups to fill around the line 3s

      (.)3(.)3(.)3(.)

      where only the first and last partition is allowed to be empty, otherwise the 3 would collide.

      (1)(2)(4)(8) ()(12)(4)(8) ()(1)(24)(8) ()(1)(2)(48) (12)(4)(8)() (1)(24)(8)() (1)(2)(48)() ()(1)(248)() ()(12)(48)() ()(124)(8)()

      NB: since 1,2,4 and 8 are distinct they can't collide inside the fillers.

      update

      I think this constructive approach is the best way to allow all possible solutions to appear and with the same likelihood.

      Because one only needs two random numbers rnd(24) and rnd(10) as "coordinates" to construct one.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

      Thanks, looks like I overcomplicated it a lot, this is much faster and simpler. Perhaps because there was no answer like yours at SO? :-) FWIW, if compressibility (Compress::Zlib::compress) can serve as measure of entropy, then deflating joined output from 100 runs for the same 2096 items array input -- produces string of practically same length as my solution. So, yeah, one initial shuffle and then unshift/push/first seem OK to ensure enough randomness (and speed).

        if compressibility can serve as measure of entropy...
        Very-pseudo...

        Call it mad science, but I was quite amused with what follows. Let's consider 2 solutions to OP task:

        • supf i.e. "shuffle/unshift/push/first"
        • mrcs i.e. "matrix rows/columns shuffle"

        The former by tybalt98, code almost verbatim from 11154602, and the idea for the latter suggested by LanX. Other than "(almost) equidistant most frequent key(s)", I had no solid basis to say "very-pseudo", but gut feeling only. The mrcs is, as I said, quite visual and therefore e.g. OK as a demo to teach students. I didn't add error checking, neither optimized for speed (it won't become faster than supf, but fast enough already). Here's "result predictability" measurement:

        Looks like "supf" provides 100% of possible entropy
        Looks like "mrcs" provides 67% of possible entropy
        

        This is for entertainment only, I don't think I'm qualified for "what's randomness?" discussion.

        use strict; use warnings; use feature 'say'; use List::Util qw/ first shuffle max /; use Compress::Zlib 'compress'; use POSIX 'ceil'; use constant ITEM => qr(< \d+ >); my @input = fake_data( 555, 55 ); #say scalar @input; # 2096 my $low_mark = gauge(( \@input ) x 100 ); my $high_mark = gauge( map [ shuffle @input ], 0 .. 99 ); my $supf_mark = gauge( map verify( supf( \@input )), 0 .. 99 ); my $mrcs_mark = gauge( map verify( mrcs( \@input )), 0 .. 99 ); printf "Looks like \"supf\" provides %.0f%% of possible entropy\n", 100 * ( $supf_mark - $low_mark ) / ( $high_mark - $low_mark ); printf "Looks like \"mrcs\" provides %.0f%% of possible entropy\n", 100 * ( $mrcs_mark - $low_mark ) / ( $high_mark - $low_mark ); sub gauge { length compress join '', map @$_, @_ } sub verify { my $r = shift; my $s = join '', @$r; die if $s =~ /(ITEM)\1/; return $r } sub fake_data { my ( $uniq, $pivot ) = @_; my @tmp = map { sprintf '< %09d >', rand 1e9 } 0 ... $uniq; my @out; push @out, @tmp[ 0 .. $_ ] for 0 .. $pivot; @out = shuffle( @out, @tmp[ $pivot + 1 .. $uniq ]); return @out } sub supf { # shuffle/unshift/push/first my $input_ref = shift; my @in = shuffle @$input_ref; my @out = shift @in; LOOP: while( @in ) { for my $in ( 0 .. $#in ) { if( $in[$in] ne $out[0] ) # put at start { unshift @out, splice @in, $in, 1; next LOOP; } elsif( $in[$in] ne $out[-1] ) # put at end { push @out, splice @in, $in, 1; next LOOP; } else # put in first middle place it fits { if( my $pick = first { $out[$_ - 1] ne $in[$in] and $out[$_] ne $in[$in] } 1 .. $ +#out ) { splice @out, $pick, 0, splice @in, $in, 1; next LOOP; } } } die "FAILED\nin\n @in\nout\n @out"; } return \@out; } sub mrcs { # matrix rows/columns shuffle my $input_ref = shift; my %counts; $counts{ $_ } ++ for @$input_ref; my $nrows = max values %counts; my $ncols = ceil @$input_ref / $nrows; my @frequent_keys = grep { $counts{ $_ } == $nrows } keys %counts; delete @counts{ @frequent_keys }; my @arranged = (( map {( $_ ) x $nrows } @frequent_keys ), ( map {( $_ ) x $counts{ $_ }} keys %counts )); $#arranged = $nrows * $ncols - 1; my @matrix = map [ splice @arranged, 0, $nrows ], 0 .. $ncols - 1; my @row_perm = shuffle 1 .. $ncols - 1; # exclude 0th column my @col_perm = shuffle 0 .. $nrows - 1; for my $col ( @matrix ) { splice @$col, 0, $nrows, @$col[ @col_perm ]; } splice @matrix, 1, $ncols - 1, @matrix[ @row_perm ]; my @ret = grep defined, map { # transpose & flatten my $row_i = $_; map { $matrix[ $_ ][ $row_i ]} 0 .. $ncols - 1 } 0 .. $nrows - 1; return \@ret; }
Re: Algorithm RFC: fast (pseudo-)random shuffle with no repetition
by tybalt89 (Monsignor) on Sep 23, 2023 at 00:00 UTC

    First pass at it...

    #!/usr/bin/perl use strict; # https://www.perlmonks.org/?node_id=11154597 use warnings; my %uniq; my $n = 0; while( <DATA> ) { $uniq{ $_ }++; $n++; } grep { $n + 1 < $_ * 2 } values %uniq and die "Cannot be done!\n"; my @norep; while( %uniq ) { my @allowed = grep { @norep == 0 || $_ ne $norep[-1] } keys %uniq; my @choices = grep { $uniq{$_} * 2 >= $n + 1 } @allowed; # required +choices @choices or @choices = @allowed or die "FAILED at n = $n\n", @norep; @choices = map { ($_) x $uniq{$_} } @choices; # weighting choices push @norep, my $pick = $choices[ rand @choices ]; --$uniq{ $pick } or delete $uniq{ $pick }; $n--; } print @norep; __DATA__ Line 1 Line 2 Line 3 Line 3 Line 3 Line 4 Line 8

    Sample Output:

    Line 3 Line 2 Line 1 Line 3 Line 8 Line 4 Line 3
Re: Algorithm RFC: fast (pseudo-)random shuffle with no repetition
by LanX (Saint) on Sep 22, 2023 at 23:36 UTC
    Hmm ... the complexity of this question depends on your definition of "random".

    How random is random enough???

    You can always quickly construct a solution - iff possible - but it's kind of predictable (symmetrical) then.

    Sort your groups by length and organize them in columns of length of the biggest group than read the matrix from left to right.

    Demo:

    qw( a a a a b b b c c )

    ==>

    a b c a b a b a c

    ==>

    qw( a b c a b a b a c )

    update

    and I think based on this you could create more solutions by randomly shuffling complete rows and columns of the matrix and reading from left to right again

    1 2 0 3 c a 2 b a 0 b c a 1 b a

    ==>

    qw( c a b a b c a b a )

    Hence plenty of legal solutions° (not all unique) to pick from.

    Super fast and easy. But is this "random" enough???

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    see Wikisyntax for the Monastery

    °) 3!*4!/2 = 72 unique solutions in this case

      Thanks. (Very-pseudo) random, but hopefully very fast and quite visual demo. Perhaps excluding outermost rows/columns from shuffle is required, otherwise illegal combinations can happen, such as:

      0 1 2 0 1 2 2 0 1 0 a b c 1 a b 1 a b 1 a b ==> 2 a b ==> 2 a b ==> ababaccab 2 a b 3 a c 3 a c 3 a c 0 a b c 0 c a b

      Or, simply, no shuffle for 'aab' input.

        yes, thanks I figured that out yesterday, but wanted to wait for a reply first.

        If one allows to shuffle the 0 column , collisions are possible and one needs to check.

        If it stays fixed it's always fine.

        > (Very-pseudo) random,

        well "randomness" is not self defining, there are plenty of paradoxes in math were people had different concepts of "random". °

        For instance if you said you want

        • All legal permutations to be possible
        • And to be picked with the same likelihood

        that's a pretty hard problem to be done fast.

        I suppose many solutions here will produce certain permutations with a bigger probability. So please be explicit what kind of randomness you want.

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        see Wikisyntax for the Monastery

        °) classic example is roulette, the likelihood of a red or black number is always the same, even after a row of hundreds of reds. It's the likelihood of the red sequence which is low.

Re: Algorithm RFC: fast (pseudo-)random shuffle with no repetition
by karlgoethebier (Abbot) on Sep 23, 2023 at 15:55 UTC

    What is actually against calculating the valid permutations in advance? A try:

    #!/usr/bin/env perl use strict; use warnings; use Algorithm::Permute; use feature qw(say); use Data::Dump; my $n = [ 1, 2, 3, 3, 3, 4, 8 ]; my $p = Algorithm::Permute->new($n, 7); my $m3 = qr/(.)\1\1/; my $m2 = qr/(.)\1/; my @v; while ( my @r = $p->next ) { my $s = pack( "(A*)*", @r ); next if $s =~ /$m3/; next if $s =~ /$m2/; push( @v, $s ); } dd \@v; __END__

    «The Crux of the Biscuit is the Apostrophe»

      > What is actually against calculating the valid permutations in advance? A try:

      Runtime ...

      from the OP

      > > Both brute subroutines below aren't actually used: they are totally unusable for lists with ~15 unique strings or more, plus any decent amount of duplicates.

      Your approach is even worse, instead of brute-forcing one random result, you try to brute-force all possible solutions in advance.

      NB 15! = 1.3e12

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      see Wikisyntax for the Monastery

        It seems like that is actually a bit unwieldy. Another try:

        #!/usr/bin/env perl use strict; use warnings; use Algorithm::Permute; use Data::Dump; use List::Util qw(shuffle); use feature qw(say); my @n = shuffle( 1, 1, 1, 2, 3, 3, 3, 4, 5, 5, 5, 6, 7, 8, 11, 12, 13, + 14, 14, 15 ); my $p = Algorithm::Permute->new( \@n, 20 ); my $m3 = qr/(.)\1\1/; my $m2 = qr/(.)\1/; PERMUTE: { my @r = $p->next; dd \@r; my $s = pack( "(A*)*", @r ); if ( $s =~ /($m3)/ ) { # say $1; # sleep 1; goto PERMUTE; } if ( $s =~ /($m2)/ ) { # say $1; # sleep 1; goto PERMUTE; } say join " ", @r; } __END__

        Minor update: Fixed dead code.

        «The Crux of the Biscuit is the Apostrophe»

Re: Algorithm RFC: fast (pseudo-)random shuffle with no repetition
by perlfan (Vicar) on Sep 22, 2023 at 23:05 UTC
    Is this related in any way to the shuffle operator, under which regular languages are closed? FLAT can be used to interate over all shuffles. Sub::Genius uses it to create sequentially consistent subroutine run schedules.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11154597]
Approved by choroba
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2023-11-30 16:55 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?