Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things

Re: non-exact regexp matches

by vinforget (Beadle)
on Jun 23, 2004 at 17:27 UTC ( #369125=note: print w/replies, xml ) Need Help??

in reply to non-exact regexp matches

I refined my question a little more. I have a string of letters [ACGTacgtNn] from which I want to find a particular instance of a regexp, let's say:

I can do this just fine, but what if I want to match the above regexp with a tolerance of 2 mismatches for single characters. Below I have an example:
$buf =~ m/(A)(C)(C)(A)(A)(C)([ACGTacgtNn]{6})(CTA[ACGTacgtNn]{1})(A)(T +)(G)([ACGTacgtNn]{1,2})(G)(A)(T)(G)(T)(T)(?{ print $-[0]," ",scalar@-,"\n"; })(?!)/;
this will print the position of the match in $buf, followed by 19 (the number of submatches). I want to be able to return a match from 17-19 submathes, not just all 19. Thanks. Vince

Replies are listed 'Best First'.
Re^2: non-exact regexp matches
by japhy (Canon) on Jun 23, 2004 at 19:00 UTC
    I have a mechanism for you. Right now, it requires that you break your regex up into pieces yourself, but once I have Regexp::Parser completed, this mechanism will be available via Regexp::Parser::Fuzzy.

    It tries to be smart, making sure that when it does an "insert", it's not inserting the next thing it was supposed to match anyway (I don't think that breaks anything), and that when it does a "modify", it doesn't match the thing it was supposed to try to match.

    Also, right now, it just prints the matches. If you tell me this program does what you need it to do, then I'll help make it more useful. If the regex is something that you don't have control over (that is, it's user input), then you're going to need a regex parser to help you split it up...

    my $rx = mk_fuzzy(0, 1, 0, qw( p e r l )); "pearl" =~ $rx; # mk_fuzzy(MODs, INSs, DELs, parts...) sub mk_fuzzy { our ($m, $i, $d) = splice @_, 0, 3; use re 'eval'; qr{ (?{ [ $i, $d, $m ] }) ^ @{[ map qq{ (?: $_[$_] (?: | (?(?{ \$^R->[0] }) @{[ $_ < $#_ and "(?! $_[$_+1] + )" ]} (?s: . ) (?{ [ \$^R->[0] - 1, \$^R->[1], \$^R->[2] ] }) | (?!) + ) ) | (?(?{ \$^R->[1] }) (?{ [ \$^R->[0], \$^R->[1] - 1, \$^R->[2] + ] }) | (?!) ) | (?(?{ \$^R->[2] }) (?! $_[$_] ) (?s: . ) (?{ [ \$^R->[0], \$ +^R->[1], \$^R->[2] - 1 ] }) | (?!) ) ) }, 0 .. $#_ ]} $ (?{ printf ">> %s (M=%d/%d, I=%d/%d, D=%d/%d)\n", $&, $m-$^R->[2], + $m, $i-$^R->[0], $i, $d-$^R->[1], $d }) (?!) }x; }
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
      Good thing you used the /x modifier, or that regex would be hard to read!


      We're not really tightening our belts, it just feels that way because we're getting fatter.
        Sounds like someone wants a free pass to the Monastery Torture Chamber... ;)
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://369125]
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2016-12-05 11:14 GMT
Find Nodes?
    Voting Booth?
    On a regular basis, I'm most likely to spy upon:

    Results (80 votes). Check out past polls.