Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Re: non-exact regexp matches

by vinforget (Beadle)
on Jun 23, 2004 at 17:27 UTC ( [id://369125]=note: print w/replies, xml ) Need Help??


in reply to non-exact regexp matches

I refined my question a little more. I have a string of letters [ACGTacgtNn] from which I want to find a particular instance of a regexp, let's say:
/ACCAAC[ACGTacgtNn]{6}CTA[ACGTacgtNn]{1}ATG[ACGTacgtNn]{1,2}GATGTT/

I can do this just fine, but what if I want to match the above regexp with a tolerance of 2 mismatches for single characters. Below I have an example:
$buf =~ m/(A)(C)(C)(A)(A)(C)([ACGTacgtNn]{6})(CTA[ACGTacgtNn]{1})(A)(T +)(G)([ACGTacgtNn]{1,2})(G)(A)(T)(G)(T)(T)(?{ print $-[0]," ",scalar@-,"\n"; })(?!)/;
this will print the position of the match in $buf, followed by 19 (the number of submatches). I want to be able to return a match from 17-19 submathes, not just all 19. Thanks. Vince

Replies are listed 'Best First'.
Re^2: non-exact regexp matches
by japhy (Canon) on Jun 23, 2004 at 19:00 UTC
    I have a mechanism for you. Right now, it requires that you break your regex up into pieces yourself, but once I have Regexp::Parser completed, this mechanism will be available via Regexp::Parser::Fuzzy.

    It tries to be smart, making sure that when it does an "insert", it's not inserting the next thing it was supposed to match anyway (I don't think that breaks anything), and that when it does a "modify", it doesn't match the thing it was supposed to try to match.

    Also, right now, it just prints the matches. If you tell me this program does what you need it to do, then I'll help make it more useful. If the regex is something that you don't have control over (that is, it's user input), then you're going to need a regex parser to help you split it up...

    my $rx = mk_fuzzy(0, 1, 0, qw( p e r l )); "pearl" =~ $rx; # mk_fuzzy(MODs, INSs, DELs, parts...) sub mk_fuzzy { our ($m, $i, $d) = splice @_, 0, 3; use re 'eval'; qr{ (?{ [ $i, $d, $m ] }) ^ @{[ map qq{ (?: $_[$_] (?: | (?(?{ \$^R->[0] }) @{[ $_ < $#_ and "(?! $_[$_+1] + )" ]} (?s: . ) (?{ [ \$^R->[0] - 1, \$^R->[1], \$^R->[2] ] }) | (?!) + ) ) | (?(?{ \$^R->[1] }) (?{ [ \$^R->[0], \$^R->[1] - 1, \$^R->[2] + ] }) | (?!) ) | (?(?{ \$^R->[2] }) (?! $_[$_] ) (?s: . ) (?{ [ \$^R->[0], \$ +^R->[1], \$^R->[2] - 1 ] }) | (?!) ) ) }, 0 .. $#_ ]} $ (?{ printf ">> %s (M=%d/%d, I=%d/%d, D=%d/%d)\n", $&, $m-$^R->[2], + $m, $i-$^R->[0], $i, $d-$^R->[1], $d }) (?!) }x; }
    _____________________________________________________
    Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
    s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;
      Good thing you used the /x modifier, or that regex would be hard to read!

      ;-)


      We're not really tightening our belts, it just feels that way because we're getting fatter.
        Sounds like someone wants a free pass to the Monastery Torture Chamber... ;)
        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker, who'd like a job (NYC-area)
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://369125]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others lurking in the Monastery: (4)
As of 2024-03-28 13:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found