http://www.perlmonks.org?node_id=369132


in reply to non-exact regexp matches

what you/we really want is an implementation of the below - an optimal way to approximately match regular expressions. why this rather than the others above?

1: Bull Math Biol. 1989;51(1):5-37. Approximate matching of regular expressions. Myers EW, Miller W.


none of the above are able to compare "edit distances" *for regular expressions* in the way the Text::Levenshtein etc allow the comparison of these edit distances for strings. instead, they quite effectively hardwire a greater degree of flexibility into the patterns that can be recognized. but to do this properly, you need to 'penalize' insertions/deletions in your regexp in the same way you do for sequences. the above paper outlines a way of doing this. as for implementation - I don't know.

is there something around the BioPerl guys might know of?
...wufnik

-- in the world of the mules there are no rules --

Replies are listed 'Best First'.
Re^2: non-exact regexp matches
by vinforget (Beadle) on Jun 23, 2004 at 19:01 UTC
    What I want is a little simpler. I would just need to match the characters and not the character classes/intervals because spacing is deemed to be important in this case. I just want to allow for a certain number substitutions for the fixed characters. I will still read the paper... I may find something that will help me get to a partial solution. Thanks
    Vince