Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: (tye)Re: Algorythym for searching closest neighbor

by gryng (Hermit)
on Apr 04, 2001 at 08:10 UTC ( [id://69546]=note: print w/replies, xml ) Need Help??


in reply to (tye)Re: Algorythym for searching closest neighbor
in thread Algorythym for searching closest neighbor

Hey tye,

You know I -love- to nitpick, discuss and generally rant on alogrithms (which never looks spelt correctly). So don't take anything below as an attack on you :) (which I know you aren't, I'm just saving myself from --'s! lol).

Anyway the nearest neighbor problem can be an interesting one! There are two methods I'd like to mention for their various merits.

The first is to use a Voronoi diagram. Voronoi diagrams just creates a list of connected verti that represent the furthest distance between two points. The simplest way to implement the construction of one of these, is to start with two items and draw a perpendicular line at the midpoint of the line they create. Now add an additional point randomly and follow the same method.

The advantage here, is construction takes about n*log n time (though with random, you don't get nearly as good of performance as some other methods available, which are still n*log n though). Once constructed, the Voronoi diagram gives you lots of fun info. To determine the nearest neighbor, you simply have to find which 'cell' (enclosing vertex lines) you're in. To determine a point of maximal distance, look at the verti themselves.

The other method is fairly simple. It's like your band-width method in approach. You keep the points in sorted order. You then find the closest two points left and right of the target. Find the distance between these two points and the target. Take the smaller and construct a square bounding box around the target of width twice that distance. Now linearly search through all points within that space. Quick, simple, and it works well with most data. (btw keeping points in sorted order makes for the fast lookup).

I don't like the band-width method because it's just as complicated as the above method to get the correct answer (you don't get the correct answer just looking in that one band, you have to look in several most of the time), and not much faster except under weird circumstances. That is, spliting into band-widths helps decrease some of the search time, but it is also a more crude search since it isn't as accurate, and so you waste time removing the cruft returned (or blinding processing extra points).

Welp, I'm ranting, tired and allergic.

Ciao,
Gryn :)

  • Comment on Re: (tye)Re: Algorythym for searching closest neighbor

Replies are listed 'Best First'.
(tye)Re2: Algorythym for searching closest neighbor
by tye (Sage) on Apr 04, 2001 at 09:24 UTC

    Thanks. I mentioned it because I wanted feedback and I really appreciate yours.

    Note that I was forced to invent the banding method single-handedly about 12 years ago at the job I had at that time. It was chosen to work well with a relational database. I was quite happy with it.

    I was going to describe my approach in more detail but found that new details just kept cropping up so I just glossed over it. For example, you have to deal with the case when there are no other points in your band. You can also choose to use a bounding circle rather than square in order to trade a few floating point calculations (and more complex code) in hopes of having to read fewer records. My coordinates were integers so I used a bounding square but I I adjusted it as I went (whenever I found a new closest point). And it takes some careful factoring of the problem to make the code for this elegant.

    Anyway, I didn't understand your brief explanation of Voronoi so I hit http://www.google.com/ and came up with a Java applet that draws Voronoi diagrams. That made the basic idea quite clear rather quickly. I'm still pretty fuzzy on the details of the algorithm, but I didn't have the time to study it. (:

    It looks like Voronoi would present quite a challange for a relational database. The added complexity for the banding method is "add one index" (for Voronoi I'd think you'd probably need three new tables). The update method for the banding method is "insert one record" (for Voronoi you'd have to update all surrounding polygons).

    Thanks for the info on Voronoi diagrams. I'll certainly remember that and will dig into them more in the future.

    Another aspect of my problem that I solved with the banding method (since at that point I had the code for it written and working) was to find what "zone" a point fell in. Divide a 2-D space into polygons (each called a "zone") and then, given a point, find which zone it belongs to. Again, to make things easy to implement in a relational database, I broke each polygon into rectangles and triangles with all but the hypotenuses being either horizontal or veritcal. Then I built an index on int(log(max(width,height))).int(x/w).y and used the banding method iterating over int(log(size)) until I found a shape that contained the point.

    I wasn't as happy with this approach but it work pretty well. Actually, the one table contained multiple pavings for different zone "types" and the algorithm would search for the matching zone of each of several types at once.

    But I don't have a point; just "stream of consciousness" on the problem. (:

            - tye (but my friends call me "Tye")
      Hey!

      Well I'm glad I helped out at least a little. I reread last nights post (I'm feeling -much- better now after 8 hours of sleep) and I agree, my description was mostly incomprehensible. I think I needed a picture or a phD last night to do better :) , sorry.

      I definitely agree, that a voronoi diagram isn't suited for a relational database. At least, you shouldn't put your code in SQL or some such pseudo-language (hmm, that -wasn't- flame bait, I'm having to do a website at work, for the second time, in SQL ...shudder...). But I think it's easy enough to put the voronoi data in the database, and then use something secondary (hmm, perl? :) ) to process it.

      Anyway the other method I described seems to be halfway between yours and the voronoi. But, your method seems -very- complicated. It seems that you -might- have been too worried about trimming tests than doing tests (I don't know for certain since I don't have the actual code in front of me :) ). With the simple method you do a -very- quick test, and you've limited the search space to, hopefully, much less than a hunderd or so points (probably just ten or so). If you want to guarantee even less points, do a up-down as well as a left-right initial search. Anyway, you do some quick tests, and you can limit the tests to a reasonable amount. Then you test, you can't be afraid of testing, or you might spend more time avoiding tests than what you save :) .

      Time for work!

      Ciao,
      Gryn

        Two quick notes: First, the point is to not have to load the entire structure into RAM so the Voroni diagram would have to be coded into the database such that you could use a database index to find the part of the diagram that you need to worry about. I don't see a way to do that.

        Second, the problem isn't that I'd have to compare a few more points, it is that I'd have to read records. The I/O for reading a record usually takes much longer than comparing a few hundred points would. So, yes, I did some extra work to avoid reading more records, but I don't think the resulting implementation was much more complicated than you'd get implementing the algorithm you described if you factor the problem well and deal with all of the edge/corner cases.

                - tye (but my friends call me "Tye")

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://69546]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others examining the Monastery: (6)
As of 2024-04-19 09:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found