Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re: Re: Zipcode Proximity script

by fizbin (Chaplain)
on Apr 01, 2003 at 04:33 UTC ( #247145=note: print w/ replies, xml ) Need Help??


in reply to Re: Zipcode Proximity script
in thread Zipcode Proximity script

Here's a way to do all that prefiltering people keep mentioning: First, some helper procedures:

use constant EARTH_RADIUS => 3956; use constant PI => 4 * atan2 1, 1; sub make_quick_reject { # Args: lat (radians), long (radians), dist my($lat,$long,$dist) = @_; my($distcos) = cos($dist / EARTH_RADIUS); my($maxlatdiff) = atan2(sqrt(1 - ($distcos * $distcos)), $distcos); if (abs($lat) + $maxlatdiff > PI/2) { # special case - near poles return sub { abs($lat - $_[0]) > $maxlatdiff; }; } my($longmult); $longmult = 1 / cos(abs($lat) + $maxlatdiff); my($maxlongdiff) = $maxlatdiff * $longmult; return sub { abs($lat - $_[0]) > $maxlatdiff or abs($long - $_[1]) > + $maxlongdiff; } } sub make_quick_accept { # Args: lat (radians), long (radians), dist my($lat,$long,$dist) = @_; my($distcos) = cos($dist / EARTH_RADIUS); my($maxlatdiff) = atan2(sqrt(1 - ($distcos * $distcos)), $distcos) / + sqrt(2); my($longmult); if (abs($lat) < $maxlatdiff ) { # special case - near equator $longmult = 1; } else { $longmult = 1 / cos(abs($lat) - $maxlatdiff); } my($maxlongdiff) = $maxlatdiff * $longmult; return sub { abs($lat - $_[0]) < $maxlatdiff and abs($long - $_[1]) < $maxlongdiff; } }
Then, I assume that you've loaded your data into something like this:
# zipcode, lat, long @data = (['00210', 43.00589, 71.0132], ['00211', 43.00589, 71.0132], . +..);
And then done something like this to it:
map { $_->[1] *= PI/180; $_->[2] *= PI / 180;} @data;
Then you can do something like this:
$distance = 25; for $i (0..$#data) { my $quick_reject_routine = make_quick_reject($data[$i]->[1], $data[$i]->[2], $distance); my $quick_accept_routine = make_quick_accept($data[$i]->[1], $data[$i]->[2], $distance); for $record (grep(! $quick_reject_routine->($_->[1],$_->[2]), @data[$i+1..$#data])) { if ($quick_accept_routine->($record->[1],$record->[2]) or long_acceptance_routine($record, $data[$i], $distance) { # write it to some file or other } } }
Where for "long_calculation_routine", you substitute in essentially what you're doing now. (or better yet, something analogous to the simpler C code suggested in the post I made before logging in) Extending this to the case of checking several distances at once shouldn't be hard - just construct an array of quick reject and quick accept procedures, and then check through them in order before trying the long acceptance routine. Something like this:
@distances = qw(5 10 25 50 100); @distfiles = map { "0-$_.txt" } @distances; for $i (0..$#data) { my @quick_reject_routines = map { make_quick_reject($data[$i]->[1], $data[$i]->[2], $_); } @distances; my @quick_accept_routines = map { make_quick_accept($data[$i]->[1], $data[$i]->[2], $_); } @distances; OTHERCITY: for $j ($i..$#data)) { DISTANCE: for $d (0..$#distances) { if ($quick_reject_routines[$d]->($data[$j][1],$data[$j][2])) { next DISTANCE; } if ($quick_accept_routines[$d]->($data[$j][1],$data[$j][2]) or long_acceptance_routine($data[$j], $data[$i], $distance) { map { write_csv_line($data[$i], $data[$j], $_); write_csv_line($data[$j], $data[$i], $_); } @distfiles[$d..$#distances]; next OTHERCITY; } } } }
Of course, this still contains extraneous calls to long_acceptance_routine, but my hope is that the routines long_acceptance_routine depends on will somehow be caching recent results (see the Memoize modules) and so there won't be the performance penalty.


Comment on Re: Re: Zipcode Proximity script
Select or Download Code
Re: Zipcode Proximity script
by fizbin (Chaplain) on Apr 01, 2003 at 15:32 UTC

    Whoops. This is what I get for coding after my brain has already shut down for nightly maintenance.

    Those quick_accept and quick_reject routines will give false rejections with longitudes near the 180 degree longitude. (not an issue with US locations, I know, but...) Also, there's a stupid bit of time-wasting going on near the top there.

    Let's see if I can get it right this time:

    sub make_quick_reject { # Args: lat (radians), long (radians), dist my($lat,$long,$dist) = @_; my($maxlatdiff) = $dist / EARTH_RADIUS; if (abs($lat) + $maxlatdiff > PI/2) { # special case - near poles return sub { abs($lat - $_[0]) > $maxlatdiff; }; } my($longmult); $longmult = 1 / cos(abs($lat) + $maxlatdiff); my($maxlongdiff) = $maxlatdiff * $longmult; return sub { abs($lat - $_[0]) > $maxlatdiff or ( abs($long - $_[1]) > $maxlongdiff and abs($long - $_[1]) < 2*PI - $maxlongdiff ); }; } sub make_quick_accept { # Args: lat (radians), long (radians), dist my($lat,$long,$dist) = @_; my($maxlatdiff) = $dist / EARTH_RADIUS; $maxlatdiff /= sqrt(2); my($longmult); if (abs($lat) < $maxlatdiff ) { # special case - near equator $longmult = 1; } else { $longmult = 1 / cos(abs($lat) - $maxlatdiff); } my($maxlongdiff) = $maxlatdiff * $longmult; return sub { abs($lat - $_[0]) < $maxlatdiff and ( abs($long - $_[1]) < $maxlongdiff or abs($long - $_[1]) > 2*PI - $maxlongdiff ); }; }

    And just again as a reminder, the sub's produced by these two routines accept longitudes and latitudes in RADIANS, not degrees. (Though reworking them to use degrees instead of radians should not be difficult at all)

    And if anyone uses this code without understanding how it works, they deserve what they get. No warranty of anything useful implied at all; the code may not even pass perl -c.

Re^3: Zipcode Proximity script
by Anonymous Monk on Apr 18, 2010 at 06:33 UTC
    instead of my $maxlatdiff = atan2(sqrt(1 - ($distcos * $distcos)), $distcos); you could do
    use Math::Trig; my $maxlatdiff = acos($distcos);

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://247145]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (8)
As of 2014-12-19 02:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (70 votes), past polls