Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: sorting array of arrays reference

by Laurent_R (Canon)
on Oct 26, 2013 at 21:03 UTC ( [id://1059844]=note: print w/replies, xml ) Need Help??


in reply to sorting array of arrays reference

OK, now I have benchmarked the sort and the walk through only once solution (two variants):

use strict; use warnings; use Benchmark qw(timethese); my $nb_elements = 2e6; my @top3; my @startArray; push @startArray, ["", "", int rand (1e7)] for 1..$nb_elements; sub discard_sort{ my @masterArray = @startArray; my @top3 = sort {$b->[2] <=> $a->[2]} @masterArray[0..2]; my $min_top = $top3[2][2]; $nb_elements--; for my $sub_aref (@masterArray [3..$nb_elements]) { next if $sub_aref->[2] <= $min_top; @top3 = (sort {$b->[2] <=> $a->[2]} $sub_aref, @top3)[0..2]; $min_top = $top3[2][2]; } } sub discard_manual{ my @masterArray = @startArray; @top3 = sort {$b->[2] <=> $a->[2]} @masterArray[0..2]; my $min_top = $top3[2][2]; $nb_elements--; for my $sub_aref (@masterArray [3..$nb_elements]) { next if $sub_aref->[2] <= $min_top; if ($sub_aref->[2] > $top3[0][2]) { unshift @top3, $sub_aref; pop @top3; } elsif ( $sub_aref->[2] > $top3[1][2]) { $top3[2] = $top3[1]; $top3[1] = $sub_aref; } else { pop @top3; push @top3, $sub_aref; } $min_top = $top3[2][2]; } } sub sort_strategy { my @masterArray = @startArray; my @top3 = (sort {$b->[2] <=> $a->[2]} @masterArray)[0..2]; } timethese (5, { 'walk once with sort' => \&discard_sort, 'walk once manual insert' => \&discard_manual, 'sort' => \&sort_strategy});

The results show pretty clearly that the walk-through once solution which I originally proposed is much much faster than a sort on the full array:

Benchmark: timing 5 iterations of sort, walk once manual insert, walk +once with sort... sort: 60 wallclock secs (60.37 usr + 0.02 sys = 60.39 CPU) @ 0.08/s +(n=5) walk once manual insert: 3 wallclock secs ( 2.79 usr + 0.00 sys = 2 +.79 CPU) @ 1.79/s (n=5) walk once with sort: 3 wallclock secs ( 2.76 usr + 0.00 sys = 2.76 +CPU) @ 1.81/s (n=5)

On a dataset of 2 million records, the walk through once solution is about 21 times faster than the solution sorting the full array. For the walk through once solution, I tried two variants: using sort to maintain the auxiliary array and doing a manual insertion sort on this auxiliary array. It turns out there is very little difference between these two solutions, but the simpler sort function solution is actually slightly better, so it was not worth the effort of trying a manual insertion sort.

The larger the dataset, the larger the advantage of the walk through once strategy: with 5 million records, the walk-through once solution is 24.6 faster than the full array sort:

Benchmark: timing 5 iterations of sort, walk once manual insert, walk +once with sort... sort: 170 wallclock secs (169.96 usr + 0.03 sys = 169.99 CPU) @ 0.03 +/s (n=5) walk once manual insert: 7 wallclock secs ( 6.90 usr + 0.03 sys = 6 +.93 CPU) @ 0.72/s (n=5) walk once with sort: 7 wallclock secs ( 6.86 usr + 0.00 sys = 6.86 +CPU) @ 0.73/s (n=5)

Note that, in order to produce realistic benchmark figures for the sort solution, I had to make a copy of the original array, so that the sort would not work on an already sorted array. Part of the recorded durations is therefore wasted making a copy of the array. If we were to subtract this part from the recorded duration, the advantage of the walk through once solution would be even larger. The copying of the 5-million record array takes about 2 seconds, so that the corrected figures would be 168 sec. for the sort, and 5 sec. for the walk through once solutions, meaning that the later is actually about 33 times faster than the sort.

Replies are listed 'Best First'.
Re^2: sorting array of arrays reference
by kimlid2810 (Acolyte) on Oct 27, 2013 at 14:33 UTC
    hats off Laurent. I really thank you all for your replies and time. laurent, i ll suggest my supervisor to send you a check or bitcoins :) <3

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1059844]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2024-04-24 08:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found