OK, a bit more time now, this is one possible way of doing it:
use strict;
use warnings;
use Data::Dumper;
my @masterArray = (
["this", "that", 12563, "something", "else"],
["this", "that", 10, "something", "else"],
["this", "that", 1, "something", "else"],
["this", "that", 125638, "something", "else"],
["this", "that", 300000, "something", "else"],
);
my @top3 = sort {$b->[2] <=> $a->[2]} @masterArray[0..2];
my $min_top = $top3[2][2];
for my $sub_aref (@masterArray [3..$#masterArray]) {
next if $sub_aref <= $min_top;
@top3 = (sort {$b->[2] <=> $a->[2]} @top3, $sub_aref)[0..2];
$min_top = $top3[2][2];
}
print Dumper @top3;
This yields the following result:
$ perl subdiscard.pl
$VAR1 = [
'this',
'that',
300000,
'something',
'else'
];
$VAR2 = [
'this',
'that',
125638,
'something',
'else'
];
$VAR3 = [
'this',
'that',
12563,
'something',
'else'
];
A more general solution might be like this:
use strict;
use warnings;
use Data::Dumper;
my $nb_elements = shift;
chomp $nb_elements ;
my @masterArray;
push @masterArray, ["", "", int rand (1e7), ""] for 1..$nb_elements;
# print Dumper \@masterArray;
my @top3 = sort {$b->[2] <=> $a->[2]} @masterArray[0..2];
my $min_top = $top3[2][2];
$nb_elements--;
for my $sub_aref (@masterArray [3..$nb_elements]) {
next if $sub_aref->[2] <= $min_top;
@top3 = (sort {$b->[2] <=> $a->[2]} @top3, $sub_aref)[0..2];
$min_top = $top3[2][2];
}
print Dumper \@top3;
With one million records, the execution time is about 2.5 seconds:
$ time perl subdiscard2.pl 1000000
$VAR1 = [
[
'',
'',
9999996,
''
],
[
'',
'',
9999993,
''
],
[
'',
'',
9999990,
''
]
];
real 0m2.497s
user 0m2.386s
sys 0m0.108s
Sorting the original array and taking the first 3 elements takes about 3 times longer:
$ time perl subdiscard3.pl 1000000
$VAR1 = [
[
'',
'',
9999980,
''
],
[
'',
'',
9999955,
''
],
[
'',
'',
9999944,
''
]
];
real 0m7.605s
user 0m7.518s
sys 0m0.093s
But, in fact, in the 2.5 seconds taken by the program above, most of it (more than 2.2 seconds) is used for populating the array with random values, so that the difference between the algorithm presented above and a pure sort is much larger than it appears, probably at least a factor of 10. I'll do a real benchmark later if I can find the time.
|