good chemistry is complicated,and a little bit messy -LW PerlMonks

### Re: fastest way to compare numerical strings?

by jds17 (Pilgrim)
 on Jun 30, 2008 at 19:14 UTC ( #694815=note: print w/replies, xml ) Need Help??

in reply to fastest way to compare numerical strings?

In case not too many numbers cluster together around a few points, an alternative is to first form clusters of numbers where the numbers are at most 2 * \$delta apart (\$delta = 0.01 in your case) and then only process the candidate clusters to find numbers at most \$delta apart.

For the following code there is only one pass through the input array at the beginning and then sorts of small arrays ("small" in case the assumption made at the beginning holds).

```#maximum distance we are looking for
\$delta = 0.01;
#test array
@a = (1.02, 1.03, 6.01, 9, 1.04, 1.011, 1.025, 1.01, 0.005, -0.002);

#"discretize" points to neighboring points, scale by 1/\$delta
#to simplify computation
for (@a) {
push @{\$h{int(\$_/\$delta)}}, \$_;
push @{\$h{int(\$_/\$delta-1)}}, \$_;
push @{\$h{int(\$_/\$delta+1)}}, \$_;
}

#handle clusters
for (keys %h) {
#in case the corresponding array has more than one element,
#we know that it contains at least one pair not
#further apart than 2 * \$delta, otherwise ignore it
if (@{\$h{\$_}} > 1) {
@sorted = sort @{\$h{\$_}};
for (0..@sorted-2) {
\$r = \$sorted[\$_];
\$s = \$sorted[\$_+1];
#filter out neighboring pairs, since we do not need to
#process the numbers further, we cram them into a
#string for the final output
\$near{"\$r, \$s"} = 1 if (\$r < \$s && \$s <= \$r + \$delta);
}
}
}

print "Not further than \$delta apart are the following pairs:\n";
print "\$_\n" for (keys %near);
Output:
```Not further than 0.01 apart are the following pairs:
-0.002, 0.005
1.02, 1.025
1.025, 1.03
1.011, 1.02
1.03, 1.04
1.01, 1.011
Update: Tested using the Benchmark module and a fixed array of 300000 numbers randomly distributed between 0 and 300, the near number determining part took about 8 seconds on my reasonably modern machine.

I.e. the benchmark test starts like this:

```use strict;
use warnings;
use Benchmark;
#maximum distance we are looking for
my \$delta = 0.01;
#test array
my @a;
for (1..300000) {
push @a, rand() * 300;
}

my (\$r, \$s);

timethis ( 10 =>
sub { ...
Update: Improved the description, it gave the impression we first look for all numbers in the array not more than 2 * \$delta apart, which is not the case.

Create A New User
Node Status?
node history
Node Type: note [id://694815]
help
Chatterbox?
and the rats come out to play...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2018-04-26 19:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?
My travels bear the most uncanny semblance to ...

Results (97 votes). Check out past polls.

Notices?