in reply to fastest way to compare numerical strings?
In case not too many numbers cluster together around a few points, an alternative is to first form clusters of numbers where the numbers are at most 2 * $delta apart ($delta = 0.01 in your case) and then only process the candidate clusters to find numbers at most $delta apart.
For the following code there is only one pass through the input array at the beginning and then sorts of small arrays ("small" in case the assumption made at the beginning holds).
#maximum distance we are looking for
$delta = 0.01;
#test array
@a = (1.02, 1.03, 6.01, 9, 1.04, 1.011, 1.025, 1.01, 0.005, 0.002);
#"discretize" points to neighboring points, scale by 1/$delta
#to simplify computation
for (@a) {
push @{$h{int($_/$delta)}}, $_;
push @{$h{int($_/$delta1)}}, $_;
push @{$h{int($_/$delta+1)}}, $_;
}
#handle clusters
for (keys %h) {
#in case the corresponding array has more than one element,
#we know that it contains at least one pair not
#further apart than 2 * $delta, otherwise ignore it
if (@{$h{$_}} > 1) {
@sorted = sort @{$h{$_}};
for (0..@sorted2) {
$r = $sorted[$_];
$s = $sorted[$_+1];
#filter out neighboring pairs, since we do not need to
#process the numbers further, we cram them into a
#string for the final output
$near{"$r, $s"} = 1 if ($r < $s && $s <= $r + $delta);
}
}
}
print "Not further than $delta apart are the following pairs:\n";
print "$_\n" for (keys %near);
Output:
Not further than 0.01 apart are the following pairs:
0.002, 0.005
1.02, 1.025
1.025, 1.03
1.011, 1.02
1.03, 1.04
1.01, 1.011
Update: Tested using the Benchmark module and a fixed array of 300000 numbers randomly distributed between 0 and 300, the near number determining part took about 8 seconds on my reasonably modern machine.I.e. the benchmark test starts like this:
use strict;
use warnings;
use Benchmark;
#maximum distance we are looking for
my $delta = 0.01;
#test array
my @a;
for (1..300000) {
push @a, rand() * 300;
}
my ($r, $s);
timethis ( 10 =>
sub { ...
Update: Improved the description, it gave the impression we first look for all numbers in the array not more than 2 * $delta apart, which is not the case.
