|Do you know where your variables are?|
I've been working with some C code that crunches the 2000 US Census data into CSV files, based on the specified proximity to the origin zipcode. The problem is that the C code is horribly slow, and I can't seem to figure out why. It takes my PIII/1.3Ghz/512mb RAM machine about 20 minutes to crunch the 987k input data file for zipcodes matching within a 0-25 radius of the givin origin zipcode. That seems very slow.
The master 2000 Census data file contains records in this format:
My output file, separate for each type of range (0-25.txt for zipcodes within 0-25 miles of the origin, 0-50.txt for zipcodes within 0-50 miles of the origin, etc.), contains entries such as:
For each given zipcode found in the master file (where origin == 00210 in this case, to start with), I want to output a file that contains all matching zipcodes within the specified proximity to that zipcode. So in the example above, all of the zipcodes within 0-25 miles of 00210 would be output to 0-25.txt, a csv file containing the data shown above.
I have the working radii functions which do this, and does work (but is very slow), and looks like:
In perl, this would be:
My goal is to convert this over to perl, both so I can gain the speed and efficiency of perl (as well as make this portable to Windows systems, where the current C code doesn't quite run yet), as well as expand my knowledge of perl in general.
Has anyone done this? Any pointers that might be useful here?