Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight

Re^3: Mismatch Positions of Ambiguous String

by hv (Parson)
on Apr 27, 2006 at 10:16 UTC ( #545996=note: print w/replies, xml ) Need Help??

in reply to Re^2: Mismatch Positions of Ambiguous String
in thread Mismatch Positions of Ambiguous String

But doesn't work. Where did I go wrong?

It isn't as simple as that: my original approach relied on treating each ambiguous [ACGT] in the source string as a regexp, but to do that it needs to know a) which is the fixed string and which the regexp, and b) that they are not both regexps.

The simplest extension is as below, but this suffers further on speed - it will probably be too slow if you're dealing with strings a few thousand base pairs long:

sub mismatches { my($source, $target) = @_; my @sparts = ($source =~ /(\[.*?\]|.)/g); my @tparts = ($target =~ /(\[.*?\]|.)/g); scalar grep { my($s, $t) = ($sparts[$_], $tparts[$_]); $s !~ /\[/ ? ($s !~ /$t/) : $t !~ /\[/ ? ($t !~ /$s/) : !intersect($s, $t) } 0 .. $#sparts; } sub intersect { my($s, $t) = @_; my %seen = map +($_ => 1), $s =~ /[^\[\]]/g; scalar grep $seen{$_}, $t =~ /[^\[\]]/g; }

This says: if source is not ambiguous, treat the corresponding fragment of the target as a regexp; else if the target is not ambiguous, treat the source fragment as a regexp; else check a full intersection of the two.

If your strings only include ACGT, a more efficient approach would be to transform each string into a bit vector that sets a bit for each base that may be present:

my %bits = ('A' => 1, 'C' => 2, 'G' => 4, 'T' => 8); my $source1 = bitwise('[TCG]GGGG[AT]'); my $target1 = bitwise('AGGGG[CT]'); print mismatches($source, $target1), "\n"; sub mismatches { my($source, $target) = @_; ($source & $target) =~ tr/\0//; } sub bitwise { my $string = shift; join '', map { my $char = 0; $char |= $bits{$_} for /[ACGT]/g; chr($char) } $string =~ /(\[.*?\]|.)/g; }

Once the strings are transformed into this bitwise representation, checking for mismatches is very fast even with long strings.


Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://545996]
[james28909]: why, what is the point. if you cant see that we are smarter than cavemen then i need to just shutup lol. cavemen didnt propell themselves onto a moon. they stared at it and probably howled
[SuicideJunkie]: We are certainly vastly more educated and wealthy. Raw intelligence is much more difficult to measure.
[james28909]: exactly. i think the question that should be asked is, where will intelligence take us.

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (8)
As of 2017-12-15 15:07 GMT
Find Nodes?
    Voting Booth?
    What programming language do you hate the most?

    Results (433 votes). Check out past polls.