Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Split range 0 to M into N non-overlapping (roughly equal) ranges.

by BrowserUk (Pope)
on Mar 12, 2011 at 15:45 UTC ( #892828=perlquestion: print w/replies, xml ) Need Help??

BrowserUk has asked for the wisdom of the Perl Monks concerning the following question:

I seem to find myself having to do this more an more frequently, but I've never really come up with a clean way to code it. My current method is:

#! perl -slw use strict; our $N //= 2; our $M //= 1e6; my( $step, $s, @ranges ) = ( int $M / $N, 0 ); push( @ranges, [ $s, $s+$step -1] ), $s+=$step for 1 ..$N; $ranges[ -1][1] = $M; printf "%2d : from %7d to %7d (%7d)\n", $_, @{ $ranges[ $_ ] }, $ranges[ $_ ][ 1 ] - $ranges[ $_ ][ 0 ] + 1 for 0 .. $#ranges; __END__ [15:43:18.51] C:\test>ranges -N=1 -M=9999997 0 : from 0 to 9999997 (9999998) [15:43:29.03] C:\test>ranges -N=2 -M=9999997 0 : from 0 to 4999997 (4999998) 1 : from 4999998 to 9999997 (5000000) [15:43:34.61] C:\test>ranges -N=3 -M=9999997 0 : from 0 to 3333331 (3333332) 1 : from 3333332 to 6666663 (3333332) 2 : from 6666664 to 9999997 (3333334) [15:43:47.64] C:\test>ranges -N=4 -M=9999997 0 : from 0 to 2499998 (2499999) 1 : from 2499999 to 4999997 (2499999) 2 : from 4999998 to 7499996 (2499999) 3 : from 7499997 to 9999997 (2500001) [15:43:57.34] C:\test>ranges -N=5 -M=9999997 0 : from 0 to 1999998 (1999999) 1 : from 1999999 to 3999997 (1999999) 2 : from 3999998 to 5999996 (1999999) 3 : from 5999997 to 7999995 (1999999) 4 : from 7999996 to 9999997 (2000002)

How would you do this?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by Corion (Pope) on Mar 12, 2011 at 18:12 UTC

    My immediate solution was quite similar to your approach. To look for something different, I thought about how it would work if we were counting down from $M to zero. My current approach puts all the leftovers from repeatedly subtracting $step from $M into the last range:

    ($N,$M)=@ARGV; ($step,$s)=(int$M/$N,0); puhs @ranges, [$M-$step,$M],$M-=$step while($M and ($M > 2*$step or $M%$step==0)); print 'Last step oversized by ',$M-$step

    This approach seems somewhat shorter, but it certainly is not clearer. Especially the test when to continue is ugly and wrong - if $M is 2*$step -1, you will only get one highly oversized range instead of one undersized and one ideal range. I'm not sure how to fix this.

    But this approach reminds me of Bresenham's line algorithm, which more or less has the same problem, to segment a line into roughly equal-sized segments, except that it distributes these segments in a 2-dimensional way. This algorithm likely has no nicer formulation than what you already have though, but it will distribute the oversized ranges much nicer, and the size difference between two ranges will never be larger than 1.

      ++ insightful

      Searching CPAN for "Bresenham's line algorithm" returns Algorithm::Line::Bresenham, and an XS version. It looks like it would be feasible to re-purpose the algorithm to split a range a the OP wants.

        The problem with Bresenham is that it requires you to iterate the entire range (segment) one at a time.

        For 360 degrees (or just 45 if you use the 1/8th mirroring optimisation), that isn't a problem, but for ranges above 100,000 or so (and many of mine are in the 10s or 100s of millions), it would be horribly slow.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by roboticus (Chancellor) on Mar 12, 2011 at 16:58 UTC

    BrowserUk:

    Hmmm ... I can make it differently, but no cleaner, unfortunately:

    $ cat 892828_a.pl #!/usr/bin/perl use strict; use warnings; my ($M, $N) = (9999997, 3); my @ranges = map { $_ * int($M/$N) } 0 .. $N-1; @ranges = map { [ $ranges[$_], $ranges[$_+1]//$M ] } 0 .. $N-1; print join(", ", map { "(".join("-",@{$_}).")" } @ranges), "\n"; $ perl 892828_a.pl (0-3333332), (3333332-6666664), (6666664-9999997)

    That one gets the second value wrong by one. I tried to find a way to stack the maps together, but couldn't figure out how to do it nicely.

    This one gets the values right, but uses an uglier map statement, and still requires the "$ranges[-1][1]=$M" line, which I don't particularly care for:

    $ cat 892828_b.pl #!/usr/bin/perl use strict; use warnings; my ($M, $N) = (9999997, 3); my $step=0; my @ranges = map { my ($c,$d)=($step,$_*int($M/$N)); $step=$d; [ $c, $ +d-1 ] } 1 .. $N; $ranges[-1][1]=$M; print join(", ", map { "(".join("-",@{$_}).")" } @ranges), "\n"; $ perl 892828_b.pl (0-3333331), (3333332-6666663), (6666664-9999997)

    Perhaps one of these will be close enough to inspire a better solution for you?

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

    Update: Added code tags to a bit of text so it renders correctly.

      I can make it differently, but no cleaner

      It always bugs me when something notionally so simple doesn't seem to have a clean solution. I do hope someone will prove me wrong.

      In use, it doesn't matter if the ranges vary by more than the minimum. Up to 1% variability, smallest to largest, would have no appreciable affect. But it still feels like I'm missing some trick that would allow a simple and direct implementation.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        BrowserUk:

        I know what you mean, that's what prompted yesterday's question, in fact. I took a bit of a break, and then I thought to myself "too bad reduce (from List::Util) doesn't have a version that returns an array, and then double-checked myself. It can get pretty close. Here's yet another version, which is far too ugly, but interesting in its own way:

        #!/usr/bin/perl use strict; use warnings; use List::Util qw(reduce); my ($M, $N) = (9999997,5); my $ranges = reduce { ref($a) eq "ARRAY" ? [ @$a, [ $$a[-1][1], $b] ] : [ [ $a, $b ] ] } map { $_*int($M/$N) } 0 .. $N; print join(", ", map { "(" . join("-",@{$_}).")" } @$ranges), "\n"; $ perl 892828_c.pl (0-1999999), (1999999-3999998), (3999998-5999997), (5999997-7999996), +(7999996-9999995)

        I like where it's a single statement, but the special-casing for the first element just sucks. Perhaps I should remove the special case and give it a specially-constructed list:

        #!/usr/bin/perl use strict; use warnings; use List::Util qw(reduce); my ($M, $N) = (9999997,5); my $ranges = reduce { [ @$a, [ $$a[-1][1], $b] ] } [ [ 0, 1 ] ], map { $_*int($M/$N) } 0 .. $N; print join(", ", map { "(" . join("-",@{$_}).")" } @$ranges), "\n"; $ perl 892828_d.pl Name "main::b" used only once: possible typo at 892828_d.pl line 10. (0-1), (1-0), (0-1999999), (1999999-3999998), (3999998-5999997), (5999 +997-7999996), (7999996-9999995)

        Well, it's getting a little better in that it removes the special case code, but it's broken (the first two ranges need to be removed from the result) and it's not very easy to read.

        I think I'll give up on it and move on to my electronics project. Thanks for the interesting diversion!

        ...roboticus

        When your only tool is a hammer, all problems look like your thumb.

Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by ELISHEVA (Prior) on Mar 12, 2011 at 19:40 UTC

    Why not adapt the solution ikegami suggested a few weeks back on the Near equal partitions. thread?

    sub ranges_beth2 { my ($M, $N) = @_; my $r = ($M+1)%$N; my $q = ($M+1-$r)/$N; my $iEnd=0; return [ map { my $iStart=$iEnd; [$iStart, ($iEnd+=$_) - 1] } (($q+1) x $r, ($q) x ($N-$r)) ]; }

    Sample output

    --------------------------- 0 : from 0 to 9999997 (9999998) --------------------------- 0 : from 0 to 4999998 (4999999) 1 : from 4999999 to 9999997 (4999999) --------------------------- 0 : from 0 to 3333332 (3333333) 1 : from 3333333 to 6666665 (3333333) 2 : from 6666666 to 9999997 (3333332) --------------------------- 0 : from 0 to 2499999 (2500000) 1 : from 2500000 to 4999999 (2500000) 2 : from 5000000 to 7499998 (2499999) 3 : from 7499999 to 9999997 (2499999)

    It ensures that partitions never differ by more than 1 in size and avoids rounding errors. On benchmark results it is also marginally faster (4-8%) than the code in the OP (actual result varies between runs)

    s/iter buk eli_ik buk 1.01 -- -8% eli_ik 0.923 9% -- Rate buk eli_ik buk 1.02/s -- -4% eli_ik 1.06/s 4% --

    Benchmark code

    Update: added that it avoids rounding errors.

    Update: minor revision to account for OP desiring range to go from 0..$M ($M+1 values) not 0..($M-1), i.e. $M values.

      I don't find that to be particularly clean. It uses more intermediate variables and terms.

      Efficiency isn't a consideration here as this usually precedes the spawning of N threads that each iterate over $M/$N elements of data.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by salva (Abbot) on Mar 12, 2011 at 18:09 UTC
    $M++; my $step = int ($M / $N); my @s = (map($step * $_, 0..$N-1), $M); my @ranges = map [$s[$_ - 1], $s[$_] - 1], 1..$#s;
Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by eye (Chaplain) on Mar 12, 2011 at 18:39 UTC
    Not sure that what bugs you was what bugged me, but...

    #!/usr/bin/perl use strict; use warnings; # Faked input my $N = 5; my $M = 9999997; # Compute the sizes of the ranges $M += 1; my $step = int $M / $N; my $extra = $M - $N * $step; my(@sizes) = ($step) x $N; @sizes = map { ($extra-- > 0) ? $_ + 1 : $_ } @sizes; # Generate the ranges my @ranges = ([undef, -1]); for my $size (@sizes) { my $m = $ranges[-1][1]; push( @ranges, [ $m + 1, $m + $size ] ); } shift @ranges; # Output printf "%2d : from %7d to %7d (%7d)\n", $_, @{ $ranges[ $_ ] }, $ranges[ $_ ][ 1 ] - $ranges[ $_ ][ 0 ] + 1 for 0 .. $#ranges;

    This produces:

    0 : from 0 to 1999999 (2000000) 1 : from 2000000 to 3999999 (2000000) 2 : from 4000000 to 5999999 (2000000) 3 : from 6000000 to 7999998 (1999999) 4 : from 7999999 to 9999997 (1999999)
Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by JavaFan (Canon) on Mar 12, 2011 at 19:41 UTC
    Without additional variables, or special casing the last range:
    my @ranges = map {[int($_*($M+1)/$N), int(($_+1)*(($M+1)/$N))-1]} 0..$ +N - 1;
    All the ranges should differ at most 1 in size.

      With the addition of a little whitespace and factoring out the step size as that is often required for other parts of the code, that's the one I think:

      #! perl -slw use strict; our $N //= 2; our $M //= 1e6; my $STEP = ( $M + 1 ) / $N; my @ranges = map [ int( $_ * $STEP ), int( ( $_+1 ) * $STEP ) -1 ], 0 .. $N - 1; printf "%2d : from %7d to %7d (%7d)\n", $_, @{ $ranges[ $_ ] }, $ranges[ $_ ][ 1 ] - $ranges[ $_ ][ 0 ] + 1 for 0 .. $#ranges;

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      FYI, there are occasional one-off rounding error issues with this algorithm (possibly platform dependent), 2.5% of the time partitioning ranges of 1..100 or less. Depending on your goals that may or may not matter, but here are some examples :

      $M,$N final array element (should be $M) (14,11) ==> final=13 (14,13) ==> final=13 (29,11) ==> final=28 (29,13) ==> final=28 (29,22) ==> final=28 (29,26) ==> final=28 (59,11) ==> final=58 (59,13) ==> final=58 (59,22) ==> final=58 (59,26) ==> final=58 (59,44) ==> final=58 (59,52) ==> final=58 (59,55) ==> final=58 (100,49) ==> final=99 (100,98) ==> final=99 (100,99) ==> final=99

      Test script

      Update added test script and added parenthetical remark about platform dependency.

        Good catch! As originally coded, the error rate across all M/N combinations (< 2**32), seems to come out at ~ 1 in 15 (6.66%).

        #! perl -slw use strict; use List::Util qw[ min ]; use Math::Random::MT qw[ rand ]; $|++; sub check { my( $m, $n ) = @_; my $step = ( $m +1 ) / $n; my $f = int( $n * $step ) -1; return if $f != $m; return 1; } my $trials = 0; my $fails = 0; for ( 1 .. 1e6 ) { my $m = int( rand 2**32 ); for ( 1 .. min( $m, 1000 ) ) { ++$trials; my $n = 1+int( rand $m ); check( $m, $n ) or ++$fails; } printf "\r$_ : %f%%", $fails *100 / $trials; } __END__ C:\test>ranges 76977645 : 6.620262%

        However, a simple fudge floating point rounding correction factor of 0.000001 seems to sort things out nicely:

        #! perl -slw use strict; use List::Util qw[ min ]; use Math::Random::MT qw[ rand ]; $|++; sub check { my( $m, $n ) = @_; my $step = ( $m +1.000001 ) / $n; my $f = int( $n * $step ) -1; # warn( "m:$m n:$n f:$f\n" ) return if $f != $m; return 1; } my $trials = 0; my $fails = 0; for ( 1 .. 1e6 ) { my $m = int( rand 2**32 ); for ( 1 .. min( $m, 1000 ) ) { ++$trials; my $n = 1+int( rand $m ); check( $m, $n ) or ++$fails; } printf "\r$_ : %f%%", $fails *100 / $trials; } __END__ C:\test>ranges 6783635 : 0.000000%

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        Yeah, they are rounding errors. Rearranging the calculation fixes it (at least, on my platform):
        sub ranges_javafan { my ($M, $N) = @_; return [ map {[int($_/$N*($M+1)) , int((($_+1)/$N)*($M+1))-1] } 0..$N - 1]; }
        The most important is the final ($_+1)/$N*($M+1) - that should return $M+1 for $_ == $N-1. The impact of other rounding errors is that a range is 1 shorter than expected (but making the next one one larger - it shouldn't skip numbers).

        What?

        [ 6:16:11.35] C:\test>ranges -M=14 -N=11 0 : from 0 to 0 ( 1) 1 : from 1 to 1 ( 1) 2 : from 2 to 3 ( 2) 3 : from 4 to 4 ( 1) 4 : from 5 to 5 ( 1) 5 : from 6 to 7 ( 2) 6 : from 8 to 8 ( 1) 7 : from 9 to 9 ( 1) 8 : from 10 to 11 ( 2) 9 : from 12 to 12 ( 1) 10 : from 13 to 13 ( 1) ======================================== 14 [ 6:16:42.76] C:\test>ranges -M=14 -N=13 0 : from 0 to 0 ( 1) 1 : from 1 to 1 ( 1) 2 : from 2 to 2 ( 1) 3 : from 3 to 3 ( 1) 4 : from 4 to 4 ( 1) 5 : from 5 to 5 ( 1) 6 : from 6 to 7 ( 2) 7 : from 8 to 8 ( 1) 8 : from 9 to 9 ( 1) 9 : from 10 to 10 ( 1) 10 : from 11 to 11 ( 1) 11 : from 12 to 12 ( 1) 12 : from 13 to 13 ( 1) ======================================== 14 [ 6:17:01.29] C:\test>ranges -M=29 -N=11 0 : from 0 to 1 ( 2) 1 : from 2 to 4 ( 3) 2 : from 5 to 7 ( 3) 3 : from 8 to 9 ( 2) 4 : from 10 to 12 ( 3) 5 : from 13 to 15 ( 3) 6 : from 16 to 18 ( 3) 7 : from 19 to 20 ( 2) 8 : from 21 to 23 ( 3) 9 : from 24 to 26 ( 3) 10 : from 27 to 28 ( 2) ======================================== 29 [ 6:17:30.04] C:\test>ranges -M=100 -N=49 0 : from 0 to 1 ( 2) 1 : from 2 to 3 ( 2) 2 : from 4 to 5 ( 2) 3 : from 6 to 7 ( 2) 4 : from 8 to 9 ( 2) 5 : from 10 to 11 ( 2) 6 : from 12 to 13 ( 2) 7 : from 14 to 15 ( 2) 8 : from 16 to 17 ( 2) 9 : from 18 to 19 ( 2) 10 : from 20 to 21 ( 2) 11 : from 22 to 23 ( 2) 12 : from 24 to 25 ( 2) 13 : from 26 to 27 ( 2) 14 : from 28 to 29 ( 2) 15 : from 30 to 31 ( 2) 16 : from 32 to 34 ( 3) 17 : from 35 to 36 ( 2) 18 : from 37 to 38 ( 2) 19 : from 39 to 40 ( 2) 20 : from 41 to 42 ( 2) 21 : from 43 to 44 ( 2) 22 : from 45 to 46 ( 2) 23 : from 47 to 48 ( 2) 24 : from 49 to 50 ( 2) 25 : from 51 to 52 ( 2) 26 : from 53 to 54 ( 2) 27 : from 55 to 56 ( 2) 28 : from 57 to 58 ( 2) 29 : from 59 to 60 ( 2) 30 : from 61 to 62 ( 2) 31 : from 63 to 64 ( 2) 32 : from 65 to 67 ( 3) 33 : from 68 to 69 ( 2) 34 : from 70 to 71 ( 2) 35 : from 72 to 73 ( 2) 36 : from 74 to 75 ( 2) 37 : from 76 to 77 ( 2) 38 : from 78 to 79 ( 2) 39 : from 80 to 81 ( 2) 40 : from 82 to 83 ( 2) 41 : from 84 to 85 ( 2) 42 : from 86 to 87 ( 2) 43 : from 88 to 89 ( 2) 44 : from 90 to 91 ( 2) 45 : from 92 to 93 ( 2) 46 : from 94 to 95 ( 2) 47 : from 96 to 97 ( 2) 48 : from 98 to 99 ( 2) ======================================== 100

        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by mr_mischief (Monsignor) on Mar 12, 2011 at 20:09 UTC

    I'm not sure if you care where the larger portions end up. My way defers taking the integer portion to avoid the errors. It equally spaces the larger and smaller bins along the ranges array. That makes sense compared to sticking them at the end or the beginning if your intent is to spread out utilization of resources evenly anyway.

    Here's some output:

    [chris@pineapple ranges-892828]$ perl ranges -M=9999997 -N=1 0 : from 0 to 9999997 (9999998) [chris@pineapple ranges-892828]$ perl ranges -M=9999997 -N=2 0 : from 0 to 4999998 (4999999) 1 : from 4999999 to 9999997 (4999999) [chris@pineapple ranges-892828]$ perl ranges -M=9999997 -N=3 0 : from 0 to 3333331 (3333332) 1 : from 3333332 to 6666664 (3333333) 2 : from 6666665 to 9999997 (3333333) [chris@pineapple ranges-892828]$ perl ranges -M=9999997 -N=4 0 : from 0 to 2499998 (2499999) 1 : from 2499999 to 4999998 (2500000) 2 : from 4999999 to 7499997 (2499999) 3 : from 7499998 to 9999997 (2500000) [chris@pineapple ranges-892828]$ perl ranges -M=9999997 -N=5 0 : from 0 to 1999998 (1999999) 1 : from 1999999 to 3999998 (2000000) 2 : from 3999999 to 5999997 (1999999) 3 : from 5999998 to 7999997 (2000000) 4 : from 7999998 to 9999997 (2000000) [chris@pineapple ranges-892828]$ perl ranges -M=9999997 -N=6 0 : from 0 to 1666665 (1666666) 1 : from 1666666 to 3333331 (1666666) 2 : from 3333332 to 4999998 (1666667) 3 : from 4999999 to 6666664 (1666666) 4 : from 6666665 to 8333330 (1666666) 5 : from 8333331 to 9999997 (1666667) [chris@pineapple ranges-892828]$ perl ranges -M=9999997 -N=7 0 : from 0 to 1428570 (1428571) 1 : from 1428571 to 2857141 (1428571) 2 : from 2857142 to 4285712 (1428571) 3 : from 4285713 to 5714283 (1428571) 4 : from 5714284 to 7142854 (1428571) 5 : from 7142855 to 8571425 (1428571) 6 : from 8571426 to 9999997 (1428572) [chris@pineapple ranges-892828]$ perl ranges -M=9999997 -N=8 0 : from 0 to 1249998 (1249999) 1 : from 1249999 to 2499998 (1250000) 2 : from 2499999 to 3749998 (1250000) 3 : from 3749999 to 4999998 (1250000) 4 : from 4999999 to 6249997 (1249999) 5 : from 6249998 to 7499997 (1250000) 6 : from 7499998 to 8749997 (1250000) 7 : from 8749998 to 9999997 (1250000) [chris@pineapple ranges-892828]$ perl ranges -M=9999997 -N=9 0 : from 0 to 1111109 (1111110) 1 : from 1111110 to 2222220 (1111111) 2 : from 2222221 to 3333331 (1111111) 3 : from 3333332 to 4444442 (1111111) 4 : from 4444443 to 5555553 (1111111) 5 : from 5555554 to 6666664 (1111111) 6 : from 6666665 to 7777775 (1111111) 7 : from 7777776 to 8888886 (1111111) 8 : from 8888887 to 9999997 (1111111)

    Here's some output from smaller inputs for sanity checking:

    [chris@pineapple ranges-892828]$ perl ranges -M=9 -N=1 0 : from 0 to 9 ( 10) [chris@pineapple ranges-892828]$ perl ranges -M=9 -N=2 0 : from 0 to 4 ( 5) 1 : from 5 to 9 ( 5) [chris@pineapple ranges-892828]$ perl ranges -M=9 -N=3 0 : from 0 to 2 ( 3) 1 : from 3 to 5 ( 3) 2 : from 6 to 9 ( 4) [chris@pineapple ranges-892828]$ perl ranges -M=9 -N=4 0 : from 0 to 1 ( 2) 1 : from 2 to 4 ( 3) 2 : from 5 to 6 ( 2) 3 : from 7 to 9 ( 3) [chris@pineapple ranges-892828]$ perl ranges -M=9 -N=5 0 : from 0 to 1 ( 2) 1 : from 2 to 3 ( 2) 2 : from 4 to 5 ( 2) 3 : from 6 to 7 ( 2) 4 : from 8 to 9 ( 2) [chris@pineapple ranges-892828]$ perl ranges -M=9 -N=6 0 : from 0 to 0 ( 1) 1 : from 1 to 2 ( 2) 2 : from 3 to 4 ( 2) 3 : from 5 to 5 ( 1) 4 : from 6 to 7 ( 2) 5 : from 8 to 9 ( 2) [chris@pineapple ranges-892828]$ perl ranges -M=4 -N=0 N < 1 makes no sense. [chris@pineapple ranges-892828]$ perl ranges -M=4 -N=1 0 : from 0 to 4 ( 5) [chris@pineapple ranges-892828]$ perl ranges -M=4 -N=2 0 : from 0 to 1 ( 2) 1 : from 2 to 4 ( 3) [chris@pineapple ranges-892828]$ perl ranges -M=4 -N=3 0 : from 0 to 0 ( 1) 1 : from 1 to 2 ( 2) 2 : from 3 to 4 ( 2) [chris@pineapple ranges-892828]$ perl ranges -M=4 -N=4 0 : from 0 to 0 ( 1) 1 : from 1 to 1 ( 1) 2 : from 2 to 2 ( 1) 3 : from 3 to 4 ( 2) [chris@pineapple ranges-892828]$ perl ranges -M=4 -N=5 M must be equal to N or larger than N.

    I'm not sure you'd call the code clean or elegant, but it does seem to work.

    #! perl -slw use strict; our $N //= 2; our $M //= 1e6; die "N < 1 makes no sense.\n\n" if $N < 1; die "M must be equal to N or larger than N.\n\n" if $M < $N; my $size = ( $M + 1 ) / $N; my @ranges; for ( my $i = 0; $i < $M; $i += $size ) { my ( $j, $k ) = ( int $i, int ($i + $size) ); push @ranges, [ $j, $k - 1, $k - $j ]; } ( printf "%2d : from %7d to %7d (%7d)\n", $_, @{ $ranges[ $_ ] } ) for 0 .. $#ranges; __END__

    I chose to use a for loop and lexical variables with small scopes rather than maps. I realize this may translate a little less cleanly to some languages that parallelize maps automatically and eschew iteration. Oh well. It's a small enough task anyway. Do the parallelization when using the ranges to determine targets later. Besides, if making full use of implementation details not present in the language at hand is a necessary consideration, it should have been in the spec. ;-)

Re: Split range 0 to M into N non-overlapping ranges.
by duelafn (Vicar) on Mar 12, 2011 at 16:01 UTC

    <smart-butt response>

    perl -slwE 'say $_ for 0..$N-2; printf "from %d to %d\n", $N-1, $M' -- -N=5 -M=9999997

    Sorry, couldn't help myself.

    Good Day,
        Dean

Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by salva (Abbot) on Mar 12, 2011 at 18:17 UTC
    [15:43:29.03] C:\test>ranges -N=2 -M=9999997 0 : from 0 to 4999997 (4999998) 1 : from 4999998 to 9999997 (5000000)
    Is this a bug? I guess it should be...
    0 : from 0 to 4999998 (4999999) 1 : from 4999999 to 9999997 (4999999)
Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by wind (Priest) on Mar 13, 2011 at 17:04 UTC

    Hi BrowserUk,

    I agree that there really isn't a clean way to do this. I'm not really sure what your primary motivation is though, whether it's efficiency or readability.

    One way to reform the problem though is to say that you don't want the range [0,M] inclusive, but actually [0,M+1) without including the end point. This way you can avoid having to treat the last or first segment as special case.

    Here is how I would redo your code by creating an intermediate data structure to contain the points

    #! perl -slw use strict; our $N //= 2; our $M //= 1e6; $M++; # No segment contains the upper end point. my @points = map {int $M * $_ / $N} (0..$N); my @ranges = map {[$points[$_], $points[$_+1]-1]} (0..$#points-1); printf "%2d : from %7d to %7d (%7d)\n", $_, @{ $ranges[ $_ ] }, $ranges[ $_ ][ 1 ] - $ranges[ $_ ][ 0 ] + 1 for 0 .. $#ranges; __END__

    - Miller

      I'm not really sure what your primary motivation is though, whether it's efficiency or readability.

      As I said elsewhere, efficiency isn't too much of a consideration for the intended purpose, though given the size of the numbers typically involved it's obviously best to avoid full-iteration schemes:

      @r= (); $M = 1e6; $N = 4; ++$r[ $M % $N ] while $M--; print "@r";; 250000 250000 250000 250000

      I've used the code I posted in the OP several times, and it works, in as much as it is good enough for the purpose of distributing M items of work across N threads relatively evenly. Within the constraints of the problem whereby it isn't worth doing for less than M > several 10s or 100s of thousands; and N will be in the 10s or low 100s for the foreseeable future. Small variability within the size of the ranges is insignificant.

      But each time I have the need to do this anew, I encounter a pregnant pause in my coding. The solution will never 'flow from my fingers' And when I go off and look up how I did it last time--ie.look at the code in the OP--it takes me several minutes of staring at the code I know worked last time, and even a few quick trial runs, before I'm convinced that I understand how it works.

      I am what I would call an 'instinctual' programmer; and each time, my instincts tell me the OP code is not right.

      So, rather than the only alternatively (to efficiency) being "readability", I'd say my goal was 'understandability'. Not so much of the individual terms and constructs of the code--anything I've forgotten there is easy to look up, try, and and refresh my mind. It is the understandability of the algorithm that I favour.

      All too often I see people promoting the idea of spreading code over 5x or 10x as many lines of code as I typically use for the same purpose, in the name of readability. Whilst this may make the individual steps of the implementation instantly recognisable to even the most casual Perler; I feel (strongly) that it has the detrimental affect of concealing the algorithm in the detail of the implementation. Your typical 'can't see the wood for the trees' scenario.

      Just as an important idea can rarely be said better with 100 words when the right 10 will do; so an algorithm is rarely expressed better by using more terms than are required. The art is in finding the right 10 words. I'm not very good at that in English, but have my moments in code.

      With regard to your implementation. The M+1 idea is quite interesting and your implementation more so. But, I would argue that you are still making a special case of the last element, in as much as running the second loop to $#points - 1 is effectively doing a pop on the results of the first loop. And that makes it necessary to code it as two loops rather than one; and requires the intermediate results array and prevents the chaining of the maps.

      I'm now very convinced that javafan/mr_mischief' insight to defer the int of the step value is the right approach, even with the requirement for a FP fudge factor--something I've got used to using in the past for other algorithms. My instinct is that the next time I go to look at the implementation weeks or months from now, I'll understand what is going on there straight away. Indeed, I think it'll probably 'flow from my fingers' next time, even if that is months from now.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by salva (Abbot) on Mar 14, 2011 at 11:12 UTC
    IMO, this is conceptually simpler and generates nice ranges:
    #! perl -slw use strict; our $N //= 2; our $M //= 1e6; my $size = $M + 1; my $step0 = int ($size / $N); my $big_ranges = $size % $N; my $start = 0; my @ranges; for (1..$N) { my $step = $step0 + ($_ <= $big_ranges ? 1 : 0); push @ranges, [$start, $start + $step - 1]; $start += $step; } printf "%2d : from %7d to %7d (%7d)\n", $_, @{ $ranges[ $_ ] }, $ranges[ $_ ][ 1 ] - $ranges[ $_ ][ 0 ] + 1 for 0 .. $#ranges;
Re: Split range 0 to M into N non-overlapping (roughly equal) ranges.
by jdrago999 (Pilgrim) on Mar 15, 2011 at 04:34 UTC
    My favorite way to do this:
    #!/usr/bin/perl -slw use strict; use warnings 'all'; use POSIX 'ceil'; our $N //= 3; our $M //= 1e6; my $each = ceil($M / $N); my $last = 0; my @ranges = ( ); for( 1..$N ) { last unless $last <= $M; my $part = [ $last => $_ == $N ? $M : $last + ( $_ == $N ? $each : $ +each - 1 ) ]; $part->[1] = $M if $_ == $N; $last += $each; push @ranges, $part; }# end for() printf "%2d : from %7d to %7d (%7d)\n", $_, @{ $ranges[ $_ ] }, $ranges[ $_ ][ 1 ] - $ranges[ $_ ][ 0 ] + 1 for 0 .. $#ranges;

    Produces the following:

    john@e6510:~/Desktop$ ./ranges -N=1 -M=9999997 0 : from 0 to 9999997 (9999998) john@e6510:~/Desktop$ ./ranges -N=2 -M=9999997 0 : from 0 to 4999998 (4999999) 1 : from 4999999 to 9999997 (4999999) john@e6510:~/Desktop$ ./ranges -N=3 -M=9999997 0 : from 0 to 3333332 (3333333) 1 : from 3333333 to 6666665 (3333333) 2 : from 6666666 to 9999997 (3333332) john@e6510:~/Desktop$ ./ranges -N=4 -M=9999997 0 : from 0 to 2499999 (2500000) 1 : from 2500000 to 4999999 (2500000) 2 : from 5000000 to 7499999 (2500000) 3 : from 7500000 to 9999997 (2499998) john@e6510:~/Desktop$ ./ranges -N=5 -M=9999997 0 : from 0 to 1999999 (2000000) 1 : from 2000000 to 3999999 (2000000) 2 : from 4000000 to 5999999 (2000000) 3 : from 6000000 to 7999999 (2000000) 4 : from 8000000 to 9999997 (1999998)

    Update: Now we properly calculate the number of ranges to create.

    Update 2: This has a bug that I'm too tired and busy to figure out right now.

      Bug?
      $ ranges -N=7 -M=9 0 : from 0 to 1 ( 2) 1 : from 2 to 3 ( 2) 2 : from 4 to 5 ( 2) 3 : from 6 to 7 ( 2) 4 : from 8 to 9 ( 2) 5 : from 10 to 11 ( 2) 6 : from 12 to 9 ( -2)

        You found it - and it's fixed now.

        Thanks for trying that out :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://892828]
Approved by Corion
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2020-10-20 02:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (208 votes). Check out past polls.

    Notices?