Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

Re^6: Fuzzy Searching: Optimizing Algorithm ( A few improvements).

by ysth (Canon)
on Dec 09, 2004 at 11:42 UTC ( [id://413481]=note: print w/replies, xml ) Need Help??


in reply to Re^5: Fuzzy Searching: Optimizing Algorithm ( A few improvements).
in thread Fuzzy Searching: Optimizing Algorithm Selection

Also, while actually ysths solution and my current solution can both be modified to handle variable length string sets I dont think either of us have the time to actually put the changes into place so I would prefer that we stick to fixed length words.
FWIW, my solution will lose efficiency with mixed length words.

I would actually like to see us come up with a module providing all three methods, possibly including variants supporting fixed vs. mixed length (assuming there's at least something that each has as an advantage, of which I'm not sure as regards mine). I'm kind of expecting BrowserUk's to best handle high amounts of fuzz.

Re: packing the return values; I seriously doubt a pack in pure perl is going to be a net win over just returning multiple elements, so I think that would benefit only the XS solution. It's also a lousy interface for a perl module to use. As far as returning an index goes, I suppose that's possible, but as is my algorithm has no need to keep the array around.

  • Comment on Re^6: Fuzzy Searching: Optimizing Algorithm ( A few improvements).

Replies are listed 'Best First'.
Re^7: Fuzzy Searching: Optimizing Algorithm ( A few improvements).
by demerphq (Chancellor) on Dec 09, 2004 at 12:44 UTC

    FWIW, my solution will lose efficiency with mixed length words.

    Well, AFAIUI the efficiency will be determined by the ratio of MIN_KEY_LEN/FUZZ. The smaller it is the less efficient with the degenerate case being a slower version of a bruteforce XOR.

    Re: packing the return values; I seriously doubt a pack in pure perl is going to be a net win over just returning multiple elements, so I think that would benefit only the XS solution.

    Im happy with either way. And yes the benefit to an XS solution was one of the reasons I didnt do it originally. But i dont entirely agree it a lousy interface. For large numbers of hits and strings it means a lot less string copying is involved and has the inheirent property of being lexicographically sortable, and easy to dupecheck and compare.

    As far as returning an index goes, I suppose that's possible, but as is my algorithm has no need to keep the array around.

    Ok then we'll leave it as unpacked triplets of ($ofs,$diff,$string) returned via an arrayref.

    ---
    demerphq

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://413481]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (3)
As of 2024-04-25 07:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found