Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

Re^2: Split range 0 to M into N non-overlapping (roughly equal) ranges.

by ELISHEVA (Prior)
on Mar 13, 2011 at 05:22 UTC ( #892900=note: print w/ replies, xml ) Need Help??


in reply to Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.
in thread Split range 0 to M into N non-overlapping (roughly equal) ranges.

FYI, there are occasional one-off rounding error issues with this algorithm (possibly platform dependent), 2.5% of the time partitioning ranges of 1..100 or less. Depending on your goals that may or may not matter, but here are some examples :

$M,$N final array element (should be $M) (14,11) ==> final=13 (14,13) ==> final=13 (29,11) ==> final=28 (29,13) ==> final=28 (29,22) ==> final=28 (29,26) ==> final=28 (59,11) ==> final=58 (59,13) ==> final=58 (59,22) ==> final=58 (59,26) ==> final=58 (59,44) ==> final=58 (59,52) ==> final=58 (59,55) ==> final=58 (100,49) ==> final=99 (100,98) ==> final=99 (100,99) ==> final=99

Test script

use strict; use warnings; #======================================== # Algorithms #======================================== sub ranges_javafan { my ($M, $N) = @_; return [ map {[int($_*($M+1)/$N) , int(($_+1)*(($M+1)/$N))-1] } 0..$N - 1]; } #---------------------------------------- sub ranges_javafan_buk { my ($M, $N) = @_; my $STEP = ( $M + 1 ) / $N; return [ map [int( $_ * $STEP ), int( ( $_+1 ) * $STEP ) -1] , 0 .. $N - 1]; } #---------------------------------------- sub ranges_elisheva_ikegami { my ($M, $N) = @_; my $r = ($M+1)%$N; my $q = ($M+1-$r)/$N; my $iEnd=0; return [ map { my $iStart=$iEnd; [$iStart, ($iEnd+=$_) - 1] } (($q+1) x $r, ($q) x ($N-$r)) ]; } #======================================== # Check of final element of range #======================================== sub roundCheck { my ($cr, $M) = @_; my $iTotal=0; my $iErrors=0; for my $m (1..$M) { for my $n (1..$m) { $iTotal++; my $aRanges = $cr->($m, $n); my $iFinal = $aRanges->[-1][1]; if ($iFinal != $m) { $iErrors++; # Uncomment here to see individual errors # --------------------------------------- # print "($m,$n) ==> final=$iFinal\n"; } } } printf "Error rate: %s\n", ($iErrors*100/$iTotal); } #------------------------------------------------------- print "Javafan:\n"; roundCheck(\&ranges_javafan, 100); print "Javafan_buk:\n"; roundCheck(\&ranges_javafan_buk, 100); print "elisheva_ikegami\n"; roundCheck(\&ranges_elisheva_ikegami, 100); #outputs (Debian Linux - Lenny, perl 5.10.0, 32-bit) Javafan: Error rate: 2.47524752475248 Javafan_buk: Error rate: 2.47524752475248 elisheva_ikegami Error rate: 0

Update added test script and added parenthetical remark about platform dependency.


Comment on Re^2: Split range 0 to M into N non-overlapping (roughly equal) ranges.
Select or Download Code
Re^3: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by BrowserUk (Pope) on Mar 13, 2011 at 06:19 UTC

    What?

    [ 6:16:11.35] C:\test>ranges -M=14 -N=11 0 : from 0 to 0 ( 1) 1 : from 1 to 1 ( 1) 2 : from 2 to 3 ( 2) 3 : from 4 to 4 ( 1) 4 : from 5 to 5 ( 1) 5 : from 6 to 7 ( 2) 6 : from 8 to 8 ( 1) 7 : from 9 to 9 ( 1) 8 : from 10 to 11 ( 2) 9 : from 12 to 12 ( 1) 10 : from 13 to 13 ( 1) ======================================== 14 [ 6:16:42.76] C:\test>ranges -M=14 -N=13 0 : from 0 to 0 ( 1) 1 : from 1 to 1 ( 1) 2 : from 2 to 2 ( 1) 3 : from 3 to 3 ( 1) 4 : from 4 to 4 ( 1) 5 : from 5 to 5 ( 1) 6 : from 6 to 7 ( 2) 7 : from 8 to 8 ( 1) 8 : from 9 to 9 ( 1) 9 : from 10 to 10 ( 1) 10 : from 11 to 11 ( 1) 11 : from 12 to 12 ( 1) 12 : from 13 to 13 ( 1) ======================================== 14 [ 6:17:01.29] C:\test>ranges -M=29 -N=11 0 : from 0 to 1 ( 2) 1 : from 2 to 4 ( 3) 2 : from 5 to 7 ( 3) 3 : from 8 to 9 ( 2) 4 : from 10 to 12 ( 3) 5 : from 13 to 15 ( 3) 6 : from 16 to 18 ( 3) 7 : from 19 to 20 ( 2) 8 : from 21 to 23 ( 3) 9 : from 24 to 26 ( 3) 10 : from 27 to 28 ( 2) ======================================== 29 [ 6:17:30.04] C:\test>ranges -M=100 -N=49 0 : from 0 to 1 ( 2) 1 : from 2 to 3 ( 2) 2 : from 4 to 5 ( 2) 3 : from 6 to 7 ( 2) 4 : from 8 to 9 ( 2) 5 : from 10 to 11 ( 2) 6 : from 12 to 13 ( 2) 7 : from 14 to 15 ( 2) 8 : from 16 to 17 ( 2) 9 : from 18 to 19 ( 2) 10 : from 20 to 21 ( 2) 11 : from 22 to 23 ( 2) 12 : from 24 to 25 ( 2) 13 : from 26 to 27 ( 2) 14 : from 28 to 29 ( 2) 15 : from 30 to 31 ( 2) 16 : from 32 to 34 ( 3) 17 : from 35 to 36 ( 2) 18 : from 37 to 38 ( 2) 19 : from 39 to 40 ( 2) 20 : from 41 to 42 ( 2) 21 : from 43 to 44 ( 2) 22 : from 45 to 46 ( 2) 23 : from 47 to 48 ( 2) 24 : from 49 to 50 ( 2) 25 : from 51 to 52 ( 2) 26 : from 53 to 54 ( 2) 27 : from 55 to 56 ( 2) 28 : from 57 to 58 ( 2) 29 : from 59 to 60 ( 2) 30 : from 61 to 62 ( 2) 31 : from 63 to 64 ( 2) 32 : from 65 to 67 ( 3) 33 : from 68 to 69 ( 2) 34 : from 70 to 71 ( 2) 35 : from 72 to 73 ( 2) 36 : from 74 to 75 ( 2) 37 : from 76 to 77 ( 2) 38 : from 78 to 79 ( 2) 39 : from 80 to 81 ( 2) 40 : from 82 to 83 ( 2) 41 : from 84 to 85 ( 2) 42 : from 86 to 87 ( 2) 43 : from 88 to 89 ( 2) 44 : from 90 to 91 ( 2) 45 : from 92 to 93 ( 2) 46 : from 94 to 95 ( 2) 47 : from 96 to 97 ( 2) 48 : from 98 to 99 ( 2) ======================================== 100

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      I'm seeing the same numbers as you. But what is your output for: ranges -M=20 -N=3 ?

      I get: 0-6, 7-13, 14-20.

      I expect the last index to be 19.
        I expect the last index to be 19.

        Why? The range is 0 .. 20 inclusive:

        [ 6:17:58.02] C:\test>ranges -M=20 -N=3 0 : from 0 to 6 ( 7) 1 : from 7 to 13 ( 7) 2 : from 14 to 20 ( 7) ======================================== 21 [ 7:10:34.53] C:\test>ranges -M=20 -N=2 0 : from 0 to 9 ( 10) 1 : from 10 to 20 ( 11) ======================================== 21 [ 7:11:36.69] C:\test>ranges -M=20 -N=1 0 : from 0 to 20 ( 21) ======================================== 21

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re^3: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by JavaFan (Canon) on Mar 13, 2011 at 11:08 UTC
    Yeah, they are rounding errors. Rearranging the calculation fixes it (at least, on my platform):
    sub ranges_javafan { my ($M, $N) = @_; return [ map {[int($_/$N*($M+1)) , int((($_+1)/$N)*($M+1))-1] } 0..$N - 1]; }
    The most important is the final ($_+1)/$N*($M+1) - that should return $M+1 for $_ == $N-1. The impact of other rounding errors is that a range is 1 shorter than expected (but making the next one one larger - it shouldn't skip numbers).
Re^3: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by BrowserUk (Pope) on Mar 13, 2011 at 15:42 UTC

    Good catch! As originally coded, the error rate across all M/N combinations (< 2**32), seems to come out at ~ 1 in 15 (6.66%).

    #! perl -slw use strict; use List::Util qw[ min ]; use Math::Random::MT qw[ rand ]; $|++; sub check { my( $m, $n ) = @_; my $step = ( $m +1 ) / $n; my $f = int( $n * $step ) -1; return if $f != $m; return 1; } my $trials = 0; my $fails = 0; for ( 1 .. 1e6 ) { my $m = int( rand 2**32 ); for ( 1 .. min( $m, 1000 ) ) { ++$trials; my $n = 1+int( rand $m ); check( $m, $n ) or ++$fails; } printf "\r$_ : %f%%", $fails *100 / $trials; } __END__ C:\test>ranges 76977645 : 6.620262%

    However, a simple fudge floating point rounding correction factor of 0.000001 seems to sort things out nicely:

    #! perl -slw use strict; use List::Util qw[ min ]; use Math::Random::MT qw[ rand ]; $|++; sub check { my( $m, $n ) = @_; my $step = ( $m +1.000001 ) / $n; my $f = int( $n * $step ) -1; # warn( "m:$m n:$n f:$f\n" ) return if $f != $m; return 1; } my $trials = 0; my $fails = 0; for ( 1 .. 1e6 ) { my $m = int( rand 2**32 ); for ( 1 .. min( $m, 1000 ) ) { ++$trials; my $n = 1+int( rand $m ); check( $m, $n ) or ++$fails; } printf "\r$_ : %f%%", $fails *100 / $trials; } __END__ C:\test>ranges 6783635 : 0.000000%

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      For the record, although developed independently my version at Re: Split range 0 to M into N non-overlapping (roughly equal) ranges. has the same sort of bug, triggered by at least one of the same input pairs. I guess when two people hit the same flaw, we at least know we're close to the right solution.

      Since the fix is already discussed for JavaFan's map version, I'm sure the translation of that fix to the for loop can be left as an exercise. Personally, my fix would probably be to pull in bignum, which works well since it's a rounding error.

      Broken:

      Fixed:

      [chris@pineapple ranges-892828]$ perl -Mbignum ranges -M=14 -N=11 0 : from 0 to 0 ( 1) 1 : from 1 to 1 ( 1) 2 : from 2 to 3 ( 2) 3 : from 4 to 4 ( 1) 4 : from 5 to 5 ( 1) 5 : from 6 to 7 ( 2) 6 : from 8 to 8 ( 1) 7 : from 9 to 9 ( 1) 8 : from 10 to 11 ( 2) 9 : from 12 to 12 ( 1) 10 : from 13 to 14 ( 2) [chris@pineapple ranges-892828]$ perl -Mbignum ranges -M=13 -N=12 0 : from 0 to 0 ( 1) 1 : from 1 to 1 ( 1) 2 : from 2 to 2 ( 1) 3 : from 3 to 3 ( 1) 4 : from 4 to 4 ( 1) 5 : from 5 to 6 ( 2) 6 : from 7 to 7 ( 1) 7 : from 8 to 8 ( 1) 8 : from 9 to 9 ( 1) 9 : from 10 to 10 ( 1) 10 : from 11 to 11 ( 1) 11 : from 12 to 13 ( 2)

      It's maybe slower than fudging, but I have more faith in it. It's easy enough to add a line at the top of the program to let someone else deal with the intricacies since this isn't going to be the resource-hungry part of your code anyway.

        It's maybe slower than fudging, but I have more faith in it.

        Having called it "fudging", there is actually very sound math sitting behind it. I'm not competent to explain the math, but I can explain my testing. I've run all permutations of M & N from 1 through 10,000; and 500 million random selections of M & N = 1 .. 2**32 without seeing a single error.

        Given that in the original form, there was a 1 in 15 error rate, the odds that if there were any errors with the "fudged" algorithm, that I have not detected 1 of them after 600 million trials, are so vanishingly small as to be ... well, very, very unlikely.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://892900]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (6)
As of 2014-10-22 12:12 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    For retirement, I am banking on:










    Results (117 votes), past polls