Here's a way to do all that prefiltering people keep mentioning:
First, some helper procedures:
use constant EARTH_RADIUS => 3956;
use constant PI => 4 * atan2 1, 1;
sub make_quick_reject {
# Args: lat (radians), long (radians), dist
my($lat,$long,$dist) = @_;
my($distcos) = cos($dist / EARTH_RADIUS);
my($maxlatdiff) = atan2(sqrt(1  ($distcos * $distcos)), $distcos);
if (abs($lat) + $maxlatdiff > PI/2) {
# special case  near poles
return sub { abs($lat  $_[0]) > $maxlatdiff; };
}
my($longmult);
$longmult = 1 / cos(abs($lat) + $maxlatdiff);
my($maxlongdiff) = $maxlatdiff * $longmult;
return sub { abs($lat  $_[0]) > $maxlatdiff or abs($long  $_[1]) >
+ $maxlongdiff; }
}
sub make_quick_accept {
# Args: lat (radians), long (radians), dist
my($lat,$long,$dist) = @_;
my($distcos) = cos($dist / EARTH_RADIUS);
my($maxlatdiff) = atan2(sqrt(1  ($distcos * $distcos)), $distcos) /
+ sqrt(2);
my($longmult);
if (abs($lat) < $maxlatdiff ) {
# special case  near equator
$longmult = 1;
} else {
$longmult = 1 / cos(abs($lat)  $maxlatdiff);
}
my($maxlongdiff) = $maxlatdiff * $longmult;
return sub { abs($lat  $_[0]) < $maxlatdiff and
abs($long  $_[1]) < $maxlongdiff; }
}
Then, I assume that you've loaded your data into something like this:
# zipcode, lat, long
@data = (['00210', 43.00589, 71.0132], ['00211', 43.00589, 71.0132], .
+..);
And then done something like this to it:
map { $_>[1] *= PI/180; $_>[2] *= PI / 180;} @data;
Then you can do something like this:
$distance = 25;
for $i (0..$#data) {
my $quick_reject_routine =
make_quick_reject($data[$i]>[1], $data[$i]>[2], $distance);
my $quick_accept_routine =
make_quick_accept($data[$i]>[1], $data[$i]>[2], $distance);
for $record (grep(! $quick_reject_routine>($_>[1],$_>[2]),
@data[$i+1..$#data])) {
if ($quick_accept_routine>($record>[1],$record>[2])
or
long_acceptance_routine($record, $data[$i], $distance) {
# write it to some file or other
}
}
}
Where for "long_calculation_routine", you substitute in essentially what you're doing now.
(or better yet, something analogous to the simpler C code suggested in the post I made
before logging in)
Extending this to the case of checking several distances at once shouldn't be hard  just
construct an array of quick reject and quick accept procedures, and then check through
them in order before trying the long acceptance routine.
Something like this:
@distances = qw(5 10 25 50 100);
@distfiles = map { "0$_.txt" } @distances;
for $i (0..$#data) {
my @quick_reject_routines =
map {
make_quick_reject($data[$i]>[1], $data[$i]>[2], $_);
}
@distances;
my @quick_accept_routines =
map {
make_quick_accept($data[$i]>[1], $data[$i]>[2], $_);
}
@distances;
OTHERCITY:
for $j ($i..$#data)) {
DISTANCE:
for $d (0..$#distances) {
if ($quick_reject_routines[$d]>($data[$j][1],$data[$j][2])) {
next DISTANCE;
}
if ($quick_accept_routines[$d]>($data[$j][1],$data[$j][2])
or
long_acceptance_routine($data[$j], $data[$i], $distance) {
map { write_csv_line($data[$i], $data[$j], $_);
write_csv_line($data[$j], $data[$i], $_); }
@distfiles[$d..$#distances];
next OTHERCITY;
}
}
}
}
Of course, this still contains extraneous calls to long_acceptance_routine, but my hope is
that the routines long_acceptance_routine depends on will somehow be caching recent results
(see the Memoize modules) and so there won't be the performance penalty.
