|Pathologically Eclectic Rubbish Lister|
In Speeding up point-in-polygon -- take two, punkish was looking for ways of speeding up point in polygon calculations. Various ordering and searching methods come to mind which improve the performance by preselecting the polygons against which each point is compared. Combined with performing the math in C, these techniques can reasonably be expected to improve the performance by up to an order of magnitude.
However, I learned long ago that the quickest way to do something is to avoid doing it at all. And in the case of computer math, if you can avoid doing any math at all, you'll definitely speed things up. A common technique for avoiding math, especially time expensive math is to trade memory for speed and use a lookup table. Eg. A lookup table of logarithms as those of use who used log tables at school will remember, is much faster than evaluating the polynomial expression required to calculate them.
For the point in polygon problem of determining which zip-code a particular point falls in, the simplest way of doing this is to construct a 2-dimensional lookup table of the complex polygons representing each zip-code area and store the zip-code (or a numeric value that represents the zip-code), at the crossover of each pair of coordinates. Determining the zip-code from a point then becomes a simple task of retrieving the value stored at the point index by the (X,Y) pair of the point in question.
My solution to this problem recognises that an image-file in which each of the polygons has been plotted, each using a unique colour, is simple a large scale map. It makes the lookup a simple process of loading the image and doing a getPixel( x,y ) of the relevant point. The color (index) returned is a direct representation of the zip-code at that point on the map.
The nice thing about this process is that it does not matter how complex your polygons are, the lookup is still O(n) in the number of points to be looked up. Obviously, for a map the size of the US, some scaling is required to achieve the required accuracy.
Note: Given accurate enough image plotting, this even caters for the case of a single street (red line in the purple area), having a different zip-code to the area surrounding it on all sides.
Accuracy & scale
Depending upon the required accuracy, a single image file sufficient to cover the entire USA, may be impractical to load into memory. In this case, it may be necessary to split the map into 4 or 16 ro 64 or some other convenient number of smaller images. You would then pre-order the points so as to minimise the need to switch images. Ie. You load the first image and only look up points that fall within it's boundaries before loading the second image.
My proof-of-concept code consisted of looking up 5.25e6 randomly generated points in a single 5000x5000 image containing 2.5e5 polygons. It did this in around 35 seconds on my machine which makes it 50,000 times faster than the OP's method. Even if 256 separate images have to be loaded, it will still be at least two order of magnitude faster.
For ease of generation, the 'polygons' I used were regularly spaced, equal-sized circles, but as noted earlier, once plotted, neither the number nor complexity of the polygons has any affect upon the performance of the method.
Image generator (polygon plotter)
I've incremented the color used for each polygon by 64 (2563 / 250,000 = 67.1) in an attempt to produce a visually pleasing map, but in reality the map does not need to be visually pleasing so a standard by-1 increment could be used which would allow a 24-bit image to handle up to 16 million polygons with no affect on performance.
Random point generator
Lookup process benchmark
Timings of several runs
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.