http://www.perlmonks.org?node_id=569984


in reply to Speeding up point-in-polygon -- take two

If this is a task you're going to do repeatedly, and your polygon list is relatively static, then you should consider redefining the problem. I suspect that the loop and indexing operations my be consuming a considerable amount of your time. So maybe you can change the problem to unroll the loops.

For example, instead of storing the polygons as they're defined, break the polygons into horizontal strips with with P[0]=lower-left, P[1]=upper-left, P[2]=upper-right, and P[3]=lower-right. In the case that the top or bottom edge is a point, you'll have the same thing, where P[1]==P[2] or P[3]==P[0], respectively. Then you'll insert these simpler polygons (quadrilaterals, nominally) into the database, so you can use a simpler test. Thus, we break this polygon:

*--------------------* \ A / * *-----* / / \ \ / / \ \ / *--------* \ / *

into these three:

*--------------------* \ A / *--*-----*-------* / \ \ A(3)/ / A(2) \ \ / *--------* \ / *
Since we have quadrilaterals (and triangles, sort-of), we can simplify your subroutine into something resembling:

sub _pointIsInQuadrilateral { my ($a_point, $a_x, $a_y) = @_; # point coords my ($x, $y) = ($a_point->[0], $a_point->[1]); # poly coords # $n is the number of points in polygon. my @x = @$a_x; # Even indices: x-coordinates. my @y = @$a_y; # Odd indices: y-coordinates. if ($x < ($x[1]-$x[0]) * ($y-$y[0]) / ($y[1]-$y[0] + $x[0]) ) { # $x is left of the leftmost edge, so it's not in polygon return 0; } if ($x < ($x[3]-$x[2]) * ($y-$y[2]) / ($y[3]-$y[2] + $x[2]) ) { # $x is right of the rightmost edge, so it's not in polygon return 0; } # Point is inside quadrilateral! return 1; }

As you can see, by redefining the problem and using quadrilaterals with parallel top and bottom edges, we get the following benefits:

  • We omit one of the parameters ($n)
  • We eliminate the Y test altogether
  • We also unroll the X coordinate tests to a simple pair of left-right tests
  • Your polygon structure in the database can be simpler (see also note below).

    So we're simplifying the code by increasing our polygon count. Even with the increased number of shapes in your table, I'm guessing that select statement will probably be about the same speed. Yes, there'll be more polygons, but they'll have smaller bounding boxes, so there'll be fewer false hits. I also think (without testing!) that _pointIsInQuadrilateral should be considerably faster.

    This is only one way you could redefine the problem. After studying, you might find a different way to redefine the problem to speed things up. If you try this approach, please let me know how it turns out!

    Note: One other advantage is that with a fixed number of points per polygon, you can store the point coordinates in table directly as numbers, rather than having a list of coordinates in text. Omitting the parsing of text to numbers may provide another speedup....

    --roboticus