Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Re: Sampling from Combination Space (lcd)

by tye (Sage)
on Jul 18, 2005 at 16:04 UTC ( [id://475796]=note: print w/replies, xml ) Need Help??


in reply to Sampling from Combination Space

Repeatedly randomly sampling without replacement is equivalent to shuffling the list and just splitting the result into groups of the desired size. So do that but don't start over after that...

First, shuffle the list.

Now you can group the list into groups of 5 (or 8 or whatever) items and you've got int(158/5) unique samples. Pick a number that is relatively prime with your list size and you've got another int(158/5) samples, none of which match any of your other samples. Repeat as needed. If you run out of relatively prime numbers before you get enough samples, then you can reshuffle the list at that point (risking repeating a sampling but with even lower odds than those you theorized would be acceptable).

#!/usr/bin/perl -w use strict; # Here are some stubs so you can test my code. # Replace this with a Fisher Yates shuffle: sub PermuteList { my $av= shift @_; print "Not shuffling ( $av->[0] .. $av->[-1] )\n"; } # Replace these with your own code: sub GetPopulation { return( 100..(99+$ARGV[0]) ); } sub ProcessSample { print "( @_ )\n"; } sub Lcd { my( $x, $y )= @_; while( 1 ) { if( $y < $x ) { ( $x, $y )= ( $y, $x ); } if( $x == 0 ) { return $y; } $y %= $x; } } sub GenNextSample { my( $size, @list )= @_; my $step = @list; my( $start, $offset ); return sub { while( 1 ) { if( @list/2 <= $step ) { $step= 1; PermuteList( \@list ); $offset= $start= int rand @list; } if( 1 == Lcd( $step, 0+@list ) ) { my @sample; do { return @sample if $size == @sample; push @sample, $list[$offset]; $offset= ( $offset + $step ) % @list; } while( $offset != $start ); } $step++; } }; } my $size= $ARGV[1]; my $iter= GenNextSample( $size, GetPopulation() ); my $samples = 0; while( $samples++ < 64000 ) { my @sample = $iter->(); ProcessSample( @sample ); }

You can test the code like:

> perl lcd.pl 16 3 | more Not shuffling ( 100 .. 115 ) ( 112 113 114 ) ( 115 100 101 ) ( 102 103 104 ) ( 105 106 107 ) ( 108 109 110 ) ( 112 115 102 ) ( 105 108 111 ) ( 114 101 104 ) ( 107 110 113 ) ( 100 103 106 ) ( 112 101 106 ) ( 111 100 105 ) ( 110 115 104 ) ( 109 114 103 ) ( 108 113 102 ) ( 112 103 110 ) ( 101 108 115 ) ( 106 113 104 ) ( 111 102 109 ) ( 100 107 114 ) Not shuffling ( 100 .. 115 ) ( 102 103 104 ) ...

- tye        

Replies are listed 'Best First'.
Re^2: Sampling from Combination Space (lcd)
by abell (Chaplain) on Jul 19, 2005 at 08:48 UTC
    Repeatedly randomly sampling without replacement is equivalent to shuffling the list and just splitting the result into groups of the desired size. ...

    I am afraid such a method would seriously alter the outcome of any statistical analysis. For instance, if the number of elements in each combination is a divisor of the number of elements you pick from (in this case 158), then all (158) elements would appear the exactly same number of times in your sample.


    Cheers

    Antonio


    The stupider the astronaut, the easier it is to win the trip to Vega - A. Tucket

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://475796]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (6)
As of 2024-04-25 15:51 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found