Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options

Re: fastest way to compare numerical strings?

by jds17 (Pilgrim)
on Jun 30, 2008 at 19:14 UTC ( #694815=note: print w/replies, xml ) Need Help??

in reply to fastest way to compare numerical strings?

In case not too many numbers cluster together around a few points, an alternative is to first form clusters of numbers where the numbers are at most 2 * $delta apart ($delta = 0.01 in your case) and then only process the candidate clusters to find numbers at most $delta apart.

For the following code there is only one pass through the input array at the beginning and then sorts of small arrays ("small" in case the assumption made at the beginning holds).

#maximum distance we are looking for $delta = 0.01; #test array @a = (1.02, 1.03, 6.01, 9, 1.04, 1.011, 1.025, 1.01, 0.005, -0.002); #"discretize" points to neighboring points, scale by 1/$delta #to simplify computation for (@a) { push @{$h{int($_/$delta)}}, $_; push @{$h{int($_/$delta-1)}}, $_; push @{$h{int($_/$delta+1)}}, $_; } #handle clusters for (keys %h) { #in case the corresponding array has more than one element, #we know that it contains at least one pair not #further apart than 2 * $delta, otherwise ignore it if (@{$h{$_}} > 1) { @sorted = sort @{$h{$_}}; for (0..@sorted-2) { $r = $sorted[$_]; $s = $sorted[$_+1]; #filter out neighboring pairs, since we do not need to #process the numbers further, we cram them into a #string for the final output $near{"$r, $s"} = 1 if ($r < $s && $s <= $r + $delta); } } } print "Not further than $delta apart are the following pairs:\n"; print "$_\n" for (keys %near);
Not further than 0.01 apart are the following pairs: -0.002, 0.005 1.02, 1.025 1.025, 1.03 1.011, 1.02 1.03, 1.04 1.01, 1.011
Update: Tested using the Benchmark module and a fixed array of 300000 numbers randomly distributed between 0 and 300, the near number determining part took about 8 seconds on my reasonably modern machine.

I.e. the benchmark test starts like this:

use strict; use warnings; use Benchmark; #maximum distance we are looking for my $delta = 0.01; #test array my @a; for (1..300000) { push @a, rand() * 300; } my ($r, $s); timethis ( 10 => sub { ...
Update: Improved the description, it gave the impression we first look for all numbers in the array not more than 2 * $delta apart, which is not the case.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://694815]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others rifling through the Monastery: (10)
As of 2018-06-22 12:19 GMT
Find Nodes?
    Voting Booth?
    Should cpanminus be part of the standard Perl release?

    Results (124 votes). Check out past polls.