in reply to
Speeding up pointinpolygon  take two
If this is a task you're going to do repeatedly, and your polygon list is relatively static, then you should consider redefining the problem. I suspect that the loop and indexing operations my be consuming a considerable amount of your time. So maybe you can change the problem to unroll the loops.
For example, instead of storing the polygons as they're defined, break the polygons into horizontal strips with
with P[0]=lowerleft, P[1]=upperleft, P[2]=upperright, and P[3]=lowerright. In the case that the top or bottom edge is a point, you'll have the same thing, where P[1]==P[2] or P[3]==P[0], respectively. Then you'll insert these simpler polygons (quadrilaterals, nominally) into the database, so you can use a simpler test. Thus, we break this polygon:
**
\ A /
* ** /
/ \ \ /
/ \ \ /
** \ /
*
into these three:
**
\ A /
****
/ \ \ A(3)/
/ A(2) \ \ /
** \ /
*
Since we have quadrilaterals (and triangles, sortof), we can simplify your subroutine into something resembling:
sub _pointIsInQuadrilateral {
my ($a_point, $a_x, $a_y) = @_;
# point coords
my ($x, $y) = ($a_point>[0], $a_point>[1]);
# poly coords
# $n is the number of points in polygon.
my @x = @$a_x; # Even indices: xcoordinates.
my @y = @$a_y; # Odd indices: ycoordinates.
if ($x < ($x[1]$x[0]) * ($y$y[0]) / ($y[1]$y[0] + $x[0]) ) {
# $x is left of the leftmost edge, so it's not in polygon
return 0;
}
if ($x < ($x[3]$x[2]) * ($y$y[2]) / ($y[3]$y[2] + $x[2]) ) {
# $x is right of the rightmost edge, so it's not in polygon
return 0;
}
# Point is inside quadrilateral!
return 1;
}
As you can see, by redefining the problem and using quadrilaterals with parallel top and bottom edges, we get the following benefits:
We omit one of the parameters ($n)
We eliminate the Y test altogether
We also unroll the X coordinate tests to a simple pair of leftright tests
Your polygon structure in the database can be simpler (see also note below).
So we're simplifying the code by increasing our polygon count. Even with the increased number of shapes in your table, I'm guessing that select statement will probably be about the same speed. Yes, there'll be more polygons, but they'll have smaller bounding boxes, so there'll be fewer false hits. I also think (without testing!) that _pointIsInQuadrilateral should be considerably faster.
This is only one way you could redefine the problem. After studying, you might find a different way to redefine the problem to speed things up. If you try this approach, please let me know how it turns out!
Note: One other advantage is that with a fixed number of points per polygon, you can store the point coordinates in table directly as numbers, rather than having a list of coordinates in text. Omitting the parsing of text to numbers may provide another speedup....
roboticus
Re^2: Speeding up pointinpolygon  take two by roboticus (Chancellor) on Aug 28, 2006 at 14:39 UTC 
Another thought
If the aforementioned suggestion helps, then it may be possible to improve performance a bit more by adding even more polygons: You might break your polygons into rectangles, left triangles and right triangles, where a rectangle is 'definitely inside' a polygon, and the triangles are pasted to the left and right side of your set of rectangles to build the polygons you desire.
With the rectangles, you can avoid the subroutine call altogether. If your select statement returns one, you have the answer "Yep, it's in polygon A". If it's a triangle, then you only have one edge to check. (For a left triangle, your point is inside the polygon of $x is to the right of the edge; and for a right triangle, your point is inside if $x is to the left of the edge.)
By chewing on the problem, I'm certain you'll come up with a few other things to try...
roboticus  [reply] 
Re^2: Speeding up pointinpolygon  take two by roboticus (Chancellor) on Aug 28, 2006 at 15:16 UTC 
OkayYet another thought (and then I promise I'll shut up for a while!).
How important are exact results? If somewhat "fuzzy" edges are acceptable, you could use a quadtree, and just insert enough nodes to get the precision you want. Then you'd only need a select statement to determine whether your point is internal to the polygon.
roboticus
Who intends to stop thinking about this question for at least an hour!  [reply] 
Re^2: Speeding up pointinpolygon  take two by punkish (Priest) on Aug 28, 2006 at 19:38 UTC 
If this is a task you're going to do repeatedly, and your polygon list is relatively static, then you should consider redefining the problem. I suspect that the loop and indexing operations my be consuming a considerable amount of your time. So maybe you can change the problem to unroll the loops.
Yes, the polys are relatively static, although the points are not so. Nevertheless, I do indeed unroll both polys and points, and move them into the SQLite db. That way I avoid going back to the files repeatedly.
You are correct about "loop and indexing operations my be consuming a considerable amount of your time". Below is the output from my latest run of 5000 polys using Devel::Profiler. More than 220k points were evaluated and updated, and my pointinpoly test was performed 220k times.
Total Elapsed Time = 251.1784 Seconds
User+System Time = 189.3034 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
72.3 136.9 136.91 220726 0.0006 0.0006 main::_pointIsInPolygon
9.09 17.21 17.216 5001 0.0034 0.0034 DBI::st::fetchall_arrayref
6.25 11.82 11.825 13887 0.0009 0.0009 DBI::st::execute
4.84 9.162 178.99 1 9.1622 178.99 main::updatePoints
1.82 3.442 3.442 3885 0.0009 0.0009 DBI::db::commit
0.16 0.298 0.298 5001 0.0001 0.0001 DBI::st::fetchrow_array
0.07 0.141 0.141 3887 0 0 DBI::st::finish
0.01 0.01 0.01 3 0.0033 0.0033 IO::Handle::new
0 0 0.01 5 0 0.002 Geo::ShapeFile::get_bytes
0 0 0 6 0 0 DBI::_new_handle
0 0 0.01 5 0 0.002 Geo::ShapeFile::get_handle
0 0 0 1 0 0 DBI::connect
0 0 0 4 0 0 DBI::db::prepare
0 0 0 4 0 0 DBI::_new_sth
0 0 0 3 0 0 Geo::ShapeFile::dbf_handle
What I really have to figure out is to reduce those tests. I have an idea which I have to now figure out how to implement. Basically, it is like so 
Once I get a bunch of potential points within the rect of a poly, I should find the 4 outermost points in each dimension. Once I determine those four definitively, all points within those 4 would be also within that poly, and I wouldn't have to do the checks for them.
Now the hard part  realize the above hypothesis.

when small people start casting long shadows, it is time to go to bed
 [reply] 

_________
 . ./
 /
 /x
 \
 \
 . .\

If all of your polygons were guananteed to have internal angles > 90 degrees that might work, but you only said that they were "irregular". I had a similar idea yesterday (except mine was finding the maximum internal bounding rectangle for the polygon rather than using the query points themselves), and discarded it for this reason. :)
 [reply] [d/l] 

 [reply] 

punkish:
What I really have to figure out is to reduce those tests.
That's why I made my original suggestion. There are two ways that breaking the polygons into simple shapes will help you:
 With known shapes, your _pointIsInPolygon function is simpler, so you can use quick exits and/or the compiler can optimize it, and
 you can increase your density of polygon vs. bounding boxes.
In your original _pointIsInPolygon, you use variable indices into your vertex arrays, as well as adding a loop with it's inherent termination condition. By breaking your polygons into triangles or quadrilaterals, you can use fixed indices and omit the loop. That may significantly increase the subroutines speed. Also, your algorithm requires that you examine every edge in the polygon. By enforcing an order to the edges, you may be able to return early from the routine by taking advantage of that knowledge.
Regarding the 'density of polygon vs. bounding boxes': A rectangle aligned with your X and Y axes has a 100% chance of containing a point returned by your SELECT statement. An arbitrary triangle has a 50% chance of containing a point in the bounding rectangle. So breaking your polygons into triangles and (aligned) rectangles will have a worst case coverage of 50%. And a higher coverage density directly correlates to a lower falsepositive rate. (I.e., with a higher coverage, you'll need fewer tests because your SELECT will hit fewer bounding boxes.) With arbitrary polygons, you can have nearly *any* hit rate:
** ** **
  \ / **
  \ /  
 ** \ /  
** ** ** **
~99% ~90 ~32%
While I still think my original suggestion would be interesting, you state:
Yes, the polys are relatively static, although the points are not so. Nevertheless, I do indeed unroll both polys and points, and move them into the SQLite db. That way I avoid going back to the files repeatedly.
By this, I'm assuming you mean that the points are in the same rough relative position in the polygons, meaning that (1) the points are connected in the same order, and (2) the edges will never cross. For example, the shape on the left could morph into the shape in the center, but not the one on the right:
ag g g
/  / \ / \
/  / \ / \
b  == / \ != / \
\ ef / \ / \
\ \ ab f a c e
cd / / \/ /
cd / /\ /
\ / / b /
e df
In that case, breaking the polygons into Xaligned trapezoids (my original suggestion) may be a bit too restrictive. Perhaps using arbitrary quadrilaterals (or triangles) would give you (a) enough coverage density (i.e.just breaking the shape into quadrilaterals instead of trapezoids may simplify things enough. So you'd increase your coverage, and by breaking the polygons into triangles, you could still simplify your _pointIsInPolygon subroutine to eliminate the loops and variable indices.
roboticus  [reply] [d/l] [select] 

