Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Finding a _Similar_ Substring? (Fuzzy Searching?)

by rjahrman (Scribe)
on May 21, 2004 at 02:27 UTC ( #355155=perlquestion: print w/replies, xml ) Need Help??
rjahrman has asked for the wisdom of the Perl Monks concerning the following question:

I am searching to see if $string contains a substring, let's say "P100". However, I also want it to match to "P-100" and "P 100", or even "P1 00". The P100 is a variable; right now I have "if ($string =~ /\Q$model\E/)", so whatever I do needs to be programmatic . . . any ideas?

Thanks for the help.
  • Comment on Finding a _Similar_ Substring? (Fuzzy Searching?)

Replies are listed 'Best First'.
Re: Finding a _Similar_ Substring? (Fuzzy Searching?)
by duff (Vicar) on May 21, 2004 at 02:35 UTC
Re: Finding a _Similar_ Substring? (Fuzzy Searching?)
by BUU (Prior) on May 21, 2004 at 02:52 UTC
    Actually, reading your requirements, it sounds like a better solution might be to define a list of characters that "don't matter" when you're matching (or doing whatever you want to do). An easy way to do this would be something like:
    my @ignore=(' ','-'); #whatever for(@ignore){ s/$_//g; } #match against $_

      Since in this type of situation I'd normally expect the one pattern to be matched against many strings, I'd usually aim to approach this instead by modifying the regexp:

      my @ignore=(' ','-'); #whatever my $ignoreclass = sprintf '[%s]', join '', map quotemeta, @ignore; $re = join $ignoreclass, split //, $re;

      Of course this is only so simple if the initial pattern is a simple string: a full-on regexp is rather more difficult to introduce such modifications to reliably.

      Hugo

      If your ignore set are too complicated for character classes, you can OR them together into a regex. I doubt it would be necessary here, more likely for sets fo words.

      my $ignoreStrings = join "|", @ignore; my $deleteThese = qr/$ignoreStrings/g; $strting =~ s/$deleteThese//;

      By the way, you're using $_ to represent the various elements of @ignore, but also to denote the default object of s///. That's why I tend to avoid defaults .... better to be explicit, self-documenting, and avoid irritating errors.

      --
      TTTATCGGTCGTTATATAGATGTTTGCA

Re: Finding a _Similar_ Substring? (Fuzzy Searching?)
by BrowserUk (Pope) on May 21, 2004 at 03:21 UTC

    Depending upon how loose you want the criteria to be, you might get away with something like this.

    my $term = 'P100'; ## my $re = qr[@{[ join '\W*', split '', $term ]}]; # Improved slightl +y. my $re = qr[@{[ join '\W*', map "\Q$_\E", split '', $term ]}]x; for( 'P100', 'P-100', 'P 100', 'P1 00', 'the P 100 is very similar in style to the P-101 & P102.'. 'The P-100 is a generation behind the P1000' ) { print "Matched $1" while m[\b($re)\b]g; };; Matched P100 Matched P-100 Matched P 100 Matched P1 00 Matched P 100 Matched P-100

    You could also add /i if you want case insensitivity.


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
      What exactly are you doing in the regexes at the top? What's the difference between the first and second one?
Re: Finding a _Similar_ Substring? (Fuzzy Searching?)
by ambrus (Abbot) on May 21, 2004 at 11:47 UTC
    If, as others have suggested, you want most characters get ignored, you could strip all those characters (with y///d) from both the haystack and the needle string, and then perform a match. Also, you may want to use case-insensitive matching.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://355155]
Approved by Old_Gray_Bear
Front-paged by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chanting in the Monastery: (9)
As of 2016-10-01 19:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    How many different varieties (color, size, etc) of socks do you have in your sock drawer?






    Results (6 votes). Check out past polls.