http://www.perlmonks.org?node_id=406906


in reply to Fuzzy Searching: Optimizing Algorithm Selection

AGREP (approximate grep) probably does what you want and the algorithms are outlined on the site, plus you can get the source code. A variation built around this code may well be as fast as it gets. Here is a postscript research paper on it

cheers

tachyon

  • Comment on Re: Fuzzy Searching: Optimizing Algorithm Selection

Replies are listed 'Best First'.
Re^2: Fuzzy Searching: Optimizing Algorithm Selection
by perlcapt (Pilgrim) on Nov 11, 2004 at 04:06 UTC
    I also recommend use of agrep. The only caveat is the restrictions on free use for commercial applications. I don't believe there is anything more efficient or better suited. The link that tachyon and I point to has links to other libraries and applications.
    perlcapt
    -ben
Re^2: Fuzzy Searching: Optimizing Algorithm Selection
by BrowserUk (Patriarch) on Nov 11, 2004 at 04:33 UTC

    From a fairly quick perusal of the options, I don't think agrep will help much, except maybe as a pre-filter.

    • It won't report where in a line a a match was found.
    • It stops matching against a given line when it finds the first match.
    • If you supply a file of things to match, it doesn't tell you which one matched.

    Maybe I missed some things in amongst the six help 'screens'?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
Re^2: Fuzzy Searching: Optimizing Algorithm Selection
by halley (Prior) on Nov 12, 2004 at 19:10 UTC
    I went to the University of Arizona, and as an undergraduate, I would sit in on masters- and postdoc- level classes where folks were discussing stuff like agrep.

    Somewhere in my files I have the follow-up to this paper, which allows for affine weighting for various symbols. For example, you might say that vowels are more interchangeable than consonants, if you're looking for fuzzy matches in the pronunciation problem space. I think the author of that paper went on into bioinformatics in a big way after that.

    --
    [ e d @ h a l l e y . c c ]