roibrodo has asked for the wisdom of the Perl Monks concerning the following question:

Hi again,

I would like to seek your help in optimizing a double loop which is in the heart of my program. I really need it to be as efficient as possible to cut on running times, since it is called many times.

The idea is this. I have a large array @arr of size up to 10^6. I then get "ranges" in this array e.g. {FROM=>25000 TO=>40000} and I need to ++ all the cells in the range. What I do now is something like:

my @arr= (0) x $length; while ( ... get range to process ...) { for (my $i=$range->{FROM}; $i<$range->{TO}; ++i) { $arr[$i]++; } }

Is there anything I can do to optimize this process? Assume "get range to process" is already optimized.


Replies are listed 'Best First'.
Re: Optimizing a double loop
by moritz (Cardinal) on Jun 03, 2010 at 12:17 UTC
    The best way to speed up this loop is not to execute it.

    Yes, I'm serious. Depending on your actual problem, there might be a way around it.

    If you read abut inversion lists, you find that this idea can be generalized to overlapping lists.

    Assume you have the ranges 10-50 and 20-100; You mark each start of a range with a positive number, and each end of range by a negative number. So 1-50 and 20-100 would translate to

    [1, 20, -50, -100]

    That way you only store two integers for a large range.

    If you want to know how many ranges overlap at a given point, say at 30, you just traverse the array from left to right up to abs($_) <= 30, and count how many positive and how many negative numbers you have encountered; the difference is the number of overlapping ranges.

    You can also cache such numbers as long as the array doesn't change, to avoid having it to iterate it from the beginning every time.

      Interesting idea. The thing is once I finished processing all the ranges, I must output the full array of counts (this is passed to another program which I don't have control on). This means I will have to translate this representation to an explicit full array with count in each cell.
        You can still generate the full array of counts afterwards and have a speed benefit, because you only have to iterate once over the big array (as opposed to once per overlap in your current implementation).


        Example code for obtaining the full array:

        use strict; use warnings; my @a = (10, 15, -20, -30); my $highest = abs($a[-1]); my @full = (0) x ($highest + 1); # decrement all negative values by 1, otherwise # we would exclude the end points: for (@a) { $_-- if $_ < 0; } my $count = 0; for (0..$highest) { if (abs $a[0] == $_) { my $c = shift @a; if ($c > 0) { $count++; } else { $count--; } } $full[$_] += $count; } print join('', @full), "\n";

        This assumes that there are never two values of same magnitude in the array, ie never exactly matching start- and endpoints. You can generalize it on your own :-)

        Second update: I've just realized that the extension is very simple:

        Instead of if (abs $a[0] == $_) write while (abs $a[0] == $_), and you're done.

        That does revise the spec, doesn't it?
Re: Optimizing a double loop
by bart (Canon) on Jun 03, 2010 at 14:33 UTC
    moritz's post, or rather his discussion of it on the ChatterBox, made me think, and it inspired me to come up with a related yet different solution.

    My solution is based on the idea of a step function, often spelled "" in math, and adding them together (function of x, edge at t):

        (x, t) = 0 if x < t
                = 1 if x > t
                = ?? if x == t (often 1/2 in continuous math functions)
    As this is about discrete values, I'd rather make that
        (x, t) = 0 if x < t
                = 1 if x > t
                = 1 if x == t (for discrete input)
    That way, the added value for a range, say [20, 50], can be
        (x, 20) - (x, 51)
    So, just as with moritz's solution, you can store the value for a range as 2 tuples for each edge:
    %step = ( 20 => 1, 50+1 => -1 );
    You can imagine this to represent a number of Dirac pulses; the combination of step functions is the integral of this set of pulses, or, because we're in the discrete case, the sum of amplitudes of every pulse on its left.

    For each range you add, in the same hash, you can then increment the value for the lower edge, and decrement for the upper edge + 1.

    my %step; foreach(@range) { $step{ $_->[0] }++; $step{ $_->[1]+1 }--; }
    Now, the next step is to convert this data structure to the final outcome. If you need a complete data structure for the whole spectrum, you'll have to loop through the full range once, on each edge, adding the value for $step{$edge} to the previous accumulated value and assigning that to the range:
    my @edge = sort { $a <=> $b } keys %step; my $step = 0; my $x = shift @edge; for(my $i = $x; defined $x; $i++) { if($i == $x) { $step += $step{$x}; $x = shift @edge; } $arr[$i] += $step; }
    Tested with
    @arr = (0) x 120; @range = ([20, 50], [30, 100]);
    The end result of print "@arr\n"; is:
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 +2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 + 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 +0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Re: Optimizing a double loop
by kyle (Abbot) on Jun 03, 2010 at 14:37 UTC

    This might be faster than your inner loop:

    splice @arr, $from, $to-$from+1, map 1+$_, @arr[$from..$to];

    ...but I haven't tested it. If we had a full runnable example, we could do some Benchmarking.

Re: Optimizing a double loop
by roibrodo (Sexton) on Jun 03, 2010 at 16:18 UTC
    A followup:
    Now I have about 1000 long arrays dumped in store files. All arrays are of the same length (~10^6), each cell contains some small positive int (0-500). No undef values.
    I would like to get the average and standard deviation for each array position. For this, I use two "aggregating" arrays @sum and @sum_squares initialed with zeros. I then retrieve from the disk one array at a time and add all of its values (loop...) to the corresponding places in the "aggregating" arrays. Finally, I do the calculations for each position
    Any suggestion on how to improve that? Perhaps there is a faster way to "sum up" arrays?

      The run time for any solution will be dictated by disk I/O time. At least some options are:

      • get faster disks - a raid array may help
      • store less data - use binary files rather than text files
      • reorganise the data to make retrieval more efficient - one file with 1000 numbers per line maybe? (I doubt this will help much in this case though.)

      Of course those options aren't exclusive so you may be able to apply some or all of them. The way the files are generated may exclude some options too.

      True laziness is hard work

      GrandFather's right as far as IO being the dominant cost here.

      So why not avoid the IO completely.

      Rather than writing the numbers to disk and then reading them back to accumulate the sums and sum of squares, do the accumulation in the original process?

      The cost of accumulating these in the original process will be far less than writing the values to disk.

        Let me describe the complete scenario and try to make things more clear.

        I start by generating 1000 sets of ranges. Generating the sets is done using some other data that needs parsing etc., but it is done only once. From now on, the sets are fixed and I do not change them.

        I chose to store each set of ranges as a hash with key= range start point, value = array of lengths of all ranges that start at that start point. For example, the set of ranges ((100,200),(50,70),(50,100)) will be stored as: {100=>[100], 50=>[20,50]}

        Each of these sets is stored to disk. I can then generate a "counts array" by retrieving the required set, converting it into an inversion list (as moritz so kindly suggested) and iterating the whole range once. This works great. I can also do the same for each of the 1000 sets and aggregate the results for each position, then derive the mean and standard deviation for each position in the whole range.

        Now, there is another kind of queries I need to answer. Given some condition on ranges, e.g. "start point < x && end point > y", I would like to generate the "counts array" after removing all ranges that answer the condition. So I load the original sets, remove all ranges that apply, and continue through the pipeline. Obviously, I do not store the filtered sets (the filtering was only done for the purpose of the query).

        One last type of query is a bit different - instead of generating a "counts array", I would like to tell how many ranges answer some condition (e.g. as the e.g. before). This is also why I think I have to keep the original set and not just the inversion list, which I, by the way, do not store but generate on the fly when "counts array" queries are run.

        To conclude, I think I do need to save the original sets on disk, since generating the sets from scratch takes some time and requires reading loading other data as I mentioned in the beginning. I think it makes sense to generate the sets once, then load + filter/answer queries.

        But perhaps I can store the sets more efficiently? Maybe storing them all in one file is better?

        I should note that some sets contain up to 80000 keys (start points) and take 1.5MB when stored (using nstore), so putting together 1000 sets like this might not be such a good idea.

        As always, your ideas are welcomed. I will be more than happy to supply more clarifications.

        Thank you all.