http://www.perlmonks.org?node_id=247056


in reply to Zipcode Proximity script

First of all, what everyone else has said about first trimming your data set applies: for a given longitude and latitude, you should be able to come up with boundaries that are definitely in your set and boundaries that are definitely out. That being said, you can speed up your comparison even in the C code by doing this: I assume that in your main loop you do something like:
while (read_from_file(&newzip, &newlat, &newlon)) { mydist = great_circle_distance(newlat,newlon,origlat,origlon); if (mydist < 25) { // do stuff to add it to file } if (mydist < 50) { ... } ... }
Instead, replace it with this:
float dist1cosine = cos(25.0 / EARTH_RADIUS); float dist2cosine = cos(50.0 / EARTH_RADIUS); float dist3cosine = cos(75.0 / EARTH_RADIUS); while (read_from_file(&newzip, &newlat, &newlon)) { mydistcos = great_circle_distance_cosine(newlat,newlon, origlat,origlon); if (mydistcos >= dist1cosine) { // do stuff to add it to file } if (mydistcos >= dist2cosine) { ... } ... }
where you've got:
static inline float great_circle_distance_cosine(float lat1, float long1, float lat2, float long2) { float delta_long, temp1, temp2, delta_lat; delta_lat = lat2 - lat1; delta_long = long2 - long1; temp1 = sin(lat1) * sin(lat2); temp2 = cos(delta_lat); temp2 -= temp1; temp2 *= cos(delta_long); /* result is cos(lat1) cos(lat2) [ cos(lon1 - lon2) ] + sin(lat1) sin(lat2) */ return (temp1 + temp2); }
The point of this exercise is to avoid the atan2 and sqrt calls, which are done to take the arccos of sqrt(temp) in the original code. Over the range you're dealing with, arccos is strictly decreasing, (hence the switch from "<" to ">=" in the comparison) which means you don't need to take arccos every time in order to Actually, I think that you could translate this algorithm into perl and, even without applying any pre-distance filtering, get better results than your current program.

Edit by tye, remove PRE tags surrounding CODE tags

Replies are listed 'Best First'.
Re: Re: Zipcode Proximity script
by fizbin (Chaplain) on Apr 01, 2003 at 04:33 UTC
    Here's a way to do all that prefiltering people keep mentioning: First, some helper procedures:
    use constant EARTH_RADIUS => 3956; use constant PI => 4 * atan2 1, 1; sub make_quick_reject { # Args: lat (radians), long (radians), dist my($lat,$long,$dist) = @_; my($distcos) = cos($dist / EARTH_RADIUS); my($maxlatdiff) = atan2(sqrt(1 - ($distcos * $distcos)), $distcos); if (abs($lat) + $maxlatdiff > PI/2) { # special case - near poles return sub { abs($lat - $_[0]) > $maxlatdiff; }; } my($longmult); $longmult = 1 / cos(abs($lat) + $maxlatdiff); my($maxlongdiff) = $maxlatdiff * $longmult; return sub { abs($lat - $_[0]) > $maxlatdiff or abs($long - $_[1]) > + $maxlongdiff; } } sub make_quick_accept { # Args: lat (radians), long (radians), dist my($lat,$long,$dist) = @_; my($distcos) = cos($dist / EARTH_RADIUS); my($maxlatdiff) = atan2(sqrt(1 - ($distcos * $distcos)), $distcos) / + sqrt(2); my($longmult); if (abs($lat) < $maxlatdiff ) { # special case - near equator $longmult = 1; } else { $longmult = 1 / cos(abs($lat) - $maxlatdiff); } my($maxlongdiff) = $maxlatdiff * $longmult; return sub { abs($lat - $_[0]) < $maxlatdiff and abs($long - $_[1]) < $maxlongdiff; } }
    Then, I assume that you've loaded your data into something like this:
    # zipcode, lat, long @data = (['00210', 43.00589, 71.0132], ['00211', 43.00589, 71.0132], . +..);
    And then done something like this to it:
    map { $_->[1] *= PI/180; $_->[2] *= PI / 180;} @data;
    Then you can do something like this:
    $distance = 25; for $i (0..$#data) { my $quick_reject_routine = make_quick_reject($data[$i]->[1], $data[$i]->[2], $distance); my $quick_accept_routine = make_quick_accept($data[$i]->[1], $data[$i]->[2], $distance); for $record (grep(! $quick_reject_routine->($_->[1],$_->[2]), @data[$i+1..$#data])) { if ($quick_accept_routine->($record->[1],$record->[2]) or long_acceptance_routine($record, $data[$i], $distance) { # write it to some file or other } } }
    Where for "long_calculation_routine", you substitute in essentially what you're doing now. (or better yet, something analogous to the simpler C code suggested in the post I made before logging in) Extending this to the case of checking several distances at once shouldn't be hard - just construct an array of quick reject and quick accept procedures, and then check through them in order before trying the long acceptance routine. Something like this:
    @distances = qw(5 10 25 50 100); @distfiles = map { "0-$_.txt" } @distances; for $i (0..$#data) { my @quick_reject_routines = map { make_quick_reject($data[$i]->[1], $data[$i]->[2], $_); } @distances; my @quick_accept_routines = map { make_quick_accept($data[$i]->[1], $data[$i]->[2], $_); } @distances; OTHERCITY: for $j ($i..$#data)) { DISTANCE: for $d (0..$#distances) { if ($quick_reject_routines[$d]->($data[$j][1],$data[$j][2])) { next DISTANCE; } if ($quick_accept_routines[$d]->($data[$j][1],$data[$j][2]) or long_acceptance_routine($data[$j], $data[$i], $distance) { map { write_csv_line($data[$i], $data[$j], $_); write_csv_line($data[$j], $data[$i], $_); } @distfiles[$d..$#distances]; next OTHERCITY; } } } }
    Of course, this still contains extraneous calls to long_acceptance_routine, but my hope is that the routines long_acceptance_routine depends on will somehow be caching recent results (see the Memoize modules) and so there won't be the performance penalty.

      Whoops. This is what I get for coding after my brain has already shut down for nightly maintenance.

      Those quick_accept and quick_reject routines will give false rejections with longitudes near the 180 degree longitude. (not an issue with US locations, I know, but...) Also, there's a stupid bit of time-wasting going on near the top there.

      Let's see if I can get it right this time:

      sub make_quick_reject { # Args: lat (radians), long (radians), dist my($lat,$long,$dist) = @_; my($maxlatdiff) = $dist / EARTH_RADIUS; if (abs($lat) + $maxlatdiff > PI/2) { # special case - near poles return sub { abs($lat - $_[0]) > $maxlatdiff; }; } my($longmult); $longmult = 1 / cos(abs($lat) + $maxlatdiff); my($maxlongdiff) = $maxlatdiff * $longmult; return sub { abs($lat - $_[0]) > $maxlatdiff or ( abs($long - $_[1]) > $maxlongdiff and abs($long - $_[1]) < 2*PI - $maxlongdiff ); }; } sub make_quick_accept { # Args: lat (radians), long (radians), dist my($lat,$long,$dist) = @_; my($maxlatdiff) = $dist / EARTH_RADIUS; $maxlatdiff /= sqrt(2); my($longmult); if (abs($lat) < $maxlatdiff ) { # special case - near equator $longmult = 1; } else { $longmult = 1 / cos(abs($lat) - $maxlatdiff); } my($maxlongdiff) = $maxlatdiff * $longmult; return sub { abs($lat - $_[0]) < $maxlatdiff and ( abs($long - $_[1]) < $maxlongdiff or abs($long - $_[1]) > 2*PI - $maxlongdiff ); }; }

      And just again as a reminder, the sub's produced by these two routines accept longitudes and latitudes in RADIANS, not degrees. (Though reworking them to use degrees instead of radians should not be difficult at all)

      And if anyone uses this code without understanding how it works, they deserve what they get. No warranty of anything useful implied at all; the code may not even pass perl -c.

      instead of my $maxlatdiff = atan2(sqrt(1 - ($distcos * $distcos)), $distcos); you could do
      use Math::Trig; my $maxlatdiff = acos($distcos);