comment on

In Speeding up point-in-polygon -- take two, punkish was looking for ways of speeding up point in polygon calculations. Various ordering and searching methods come to mind which improve the performance by preselecting the polygons against which each point is compared. Combined with performing the math in C, these techniques can reasonably be expected to improve the performance by up to an order of magnitude.

However, I learned long ago that the quickest way to do something is to avoid doing it at all. And in the case of computer math, if you can avoid doing any math at all, you'll definitely speed things up. A common technique for avoiding math, especially time expensive math is to trade memory for speed and use a lookup table. Eg. A lookup table of logarithms as those of use who used log tables at school will remember, is much faster than evaluating the polynomial expression required to calculate them.

For the point in polygon problem of determining which zip-code a particular point falls in, the simplest way of doing this is to construct a 2-dimensional lookup table of the complex polygons representing each zip-code area and store the zip-code (or a numeric value that represents the zip-code), at the crossover of each pair of coordinates. Determining the zip-code from a point then becomes a simple task of retrieving the value stored at the point index by the (X,Y) pair of the point in question.

My solution to this problem recognises that an image-file in which each of the polygons has been plotted, each using a unique colour, is simple a large scale map. It makes the lookup a simple process of loading the image and doing a getPixel( x,y ) of the relevant point. The color (index) returned is a direct representation of the zip-code at that point on the map.

The nice thing about this process is that it does not matter how complex your polygons are, the lookup is still O(n) in the number of points to be looked up. Obviously, for a map the size of the US, some scaling is required to achieve the required accuracy.

Note: Given accurate enough image plotting, this even caters for the case of a single street (red line in the purple area), having a different zip-code to the area surrounding it on all sides.

Accuracy & scale

Depending upon the required accuracy, a single image file sufficient to cover the entire USA, may be impractical to load into memory. In this case, it may be necessary to split the map into 4 or 16 ro 64 or some other convenient number of smaller images. You would then pre-order the points so as to minimise the need to switch images. Ie. You load the first image and only look up points that fall within it's boundaries before loading the second image.

My proof-of-concept code consisted of looking up 5.25e6 randomly generated points in a single 5000x5000 image containing 2.5e5 polygons. It did this in around 35 seconds on my machine which makes it 50,000 times faster than the OP's method. Even if 256 separate images have to be loaded, it will still be at least two order of magnitude faster.

For ease of generation, the 'polygons' I used were regularly spaced, equal-sized circles, but as noted earlier, once plotted, neither the number nor complexity of the polygons has any affect upon the performance of the method.

Proof-of-concept code

Image generator (polygon plotter)

I've incremented the color used for each polygon by 64 (256³ / 250,000 = 67.1) in an attempt to produce a visually pleasing map, but in reality the map does not need to be visually pleasing so a standard by-1 increment could be used which would allow a 24-bit image to handle up to 16 million polygons with no affect on performance.

#! perl -slw
use strict;
use GD;

my $map = GD::Image->new( 5000, 5000, 1 )
    or die $!;

my $index = 0;

my $bg = $map->colorAllocate( 0, 0, 0 );
$map->filledRectangle( 0, 0, 4999, 4999, $bg );

for my $y ( map{ $_ *10 +5 } 0 .. 499 ) {
    for my $x ( map{ $_ * 10 + 5 } 0 .. 499 ) {
        my $color = $map->colorAllocate( unpack 'xCCC', pack 'N', $ind
+ex += 64 );
#        print $color;
        $map->filledEllipse( $x, $y, 10, 10, $color );
    }
}

open IMG, '>:raw', '569929.png' or die $!;
print IMG $map->png;
close IMG;
[download]

Random point generator

#! perl -slw
use strict;

open POINTS, '>', '569929.points' or die $!;
printf POINTS "%d:%d\n", int( rand 5000 ), int( rand 5000 ) for 1 .. 5
+.25e6;
close POINTS;
[download]

Lookup process benchmark

#! perl -slw
use strict;
use Benchmark::Timer;
use GD;


my $map = GD::Image->newFromPng( '569929.png', 1 );
 open POINTS, '<', '569929.points' or die $!;

my $T = new Benchmark::Timer;
$T->start( '5.25e6 points in 2.5e5 polys' );

while( <POINTS> ) {
    chomp;
    my( $x, $y ) = split ':';
    my $poly = $map->getPixel($x, $y );
#    printf "Point ( %4.4d, %4.4d ) found in poly: %d\n", $x, $y, $pol
+y / 64;
}

$T->stop( '5.25e6 points in 2.5e5 polys' );
$T->report;

close POINTS;
[download]

Timings of several runs

c:\test>569929-b
1 trial of 5,25e6 points in 2.5e5 polys (33.094s total)


c:\test>569929-b
1 trial of 5.25e6 points in 2.5e5 polys (36.656s total)


c:\test>569929-b
1 trial of 5.25e6 points in 2.5e5 polys (33.250s total)


c:\test>569929-b
1 trial of 5.25e6 points in 2.5e5 polys (35.094s total)

c:\test>569929-b
1 trial of 5.25e6 points in 2.5e5 polys (33.734s total)
[download]

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

In reply to Re: Speeding up ... (O(N) determination of point in polygon regardless of complexity) by BrowserUk
in thread Speeding up point-in-polygon -- take two by punkish

Are you posting in the right place? Check out Where do I post X? to know for sure.
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
Want more info? How to link or How to display code and escape characters are good places to start.


good chemistry is complicated, and a little bit messy -LW
	PerlMonks