<?xml version="1.0" encoding="windows-1252"?>
<node id="246967" title="Zipcode Proximity script" created="2003-03-31 11:03:43" updated="2005-08-10 12:23:13">
<type id="115">
perlquestion</type>
<author id="110201">
hacker</author>
<data>
<field name="doctext">
I've been working with [http://gnu-designs.com/code/zipcrunch.c|some C code] that crunches the [http://www.census.gov/geo/ZCTA/zcta.html|2000 US Census data] into CSV files, based on the specified proximity to the origin zipcode. The problem is that the C code is horribly slow, and I can't seem to figure out why. It takes my PIII/1.3Ghz/512mb RAM machine about 20 minutes to crunch the 987k input data file for zipcodes matching within a 0-25 radius of the givin origin zipcode. That seems very slow.

&lt;p&gt;The master 2000 Census data file contains records in this format:&lt;code&gt;
   ZIP_CODE ONGITUD ATITUD
   00210 71.0132 43.00589
   00211 71.0132 43.00589
   00212 71.0132 43.00589
   00213 71.0132 43.00589
   00214 71.0132 43.00589
   00215 71.0132 43.00589
   ...&lt;/code&gt;
&lt;p&gt;My output file, separate for each type of range (0-25.txt for zipcodes within 0-25 miles of the origin, 0-50.txt for zipcodes within 0-50 miles of the origin, etc.), contains entries such as:&lt;code&gt;
   00210,00210
   00210,00211
   00211,00210
   00210,00212
   00212,00210
   00210,00213
   ...&lt;/code&gt;
&lt;p&gt;For each given zipcode found in the master file (where &lt;i&gt;origin&lt;/i&gt; == 00210 in this case, to start with), I want to output a file that contains all matching zipcodes within the specified proximity to that zipcode. So in the example above, all of the zipcodes within 0-25 miles of 00210 would be output to 0-25.txt, a csv file containing the data shown above. 
&lt;p&gt;I have the working radii functions which do this, and does work (but is very slow), and looks like: 
&lt;code&gt;
   #define EARTH_RADIUS 3956

   static inline float deg_to_rad(float deg) {
        return (deg * M_PI / 180.0);
   }
   
   /* Function to calculate Great Circle distance 
      between two points. */
   static inline float great_circle_distance(float lat1,
                                             float long1,
                                             float lat2,
                                             float long2) {
           float delta_long, delta_lat, temp, distance;

           /* Find the deltas */
           delta_lat = lat2 - lat1;
           delta_long = long2 - long1;

           /* Find the GC distance */
           temp = pow(sin(delta_lat / 2.0),
                      2) + cos(lat1) * cos(lat2)
                      * pow(sin(delta_long / 2.0), 2);

           distance = EARTH_RADIUS * 2 * atan2(sqrt(temp),
                      sqrt(1 - temp));

           return (distance);
   }&lt;/code&gt;

&lt;p&gt;In perl, this would be:&lt;code&gt;
   my $distance = sqrt(($x1-$x2)**2+($y1-$y2)**2);&lt;/code&gt;
&lt;p&gt;My goal is to convert this over to perl, both so I can gain the speed and efficiency of perl (as well as make this portable to Windows systems, where the current C code doesn't quite run yet), as well as expand my knowledge of perl in general. 
&lt;p&gt;Has anyone done this? Any pointers that might be useful here? </field>
</data>
</node>
