in reply to Fast Identification Of String Difference
The 'classic' Perlish approach to this type of problem involves bitwise string boolean operations. The string $diff generated by the bitwise-xor of characters in original sequence strings can be used to produce masks that can then be used to extract the differing sub-string sequences from the original strings.
use warnings; use strict; my $s1 = 'ACTGGACGTATGCA'; my $s2 = 'AGTG-ACGC-CGCA'; my $diff = $s1 ^ $s2; my @dpos; push @dpos, [ $-[1], $+[1] - $-[1] ] while $diff =~ m{ ([^\x00]+) }xmsg; print qq{diff at offset $_->[0], length $_->[1] \n} for @dpos; (my $mask = $diff) =~ tr{\x00}{\xff}c; $s1 &= $mask; $s2 &= $mask; my $differences = qr{ [^\x00]+ }xms; @dpos = (); while ($s1 =~ m{ ($differences) }xmsg) { # this code produces same result # my @diff_data = ($1); # $s2 =~ m{ ($differences) }xmsg; # push @diff_data, $1, $-[1]; # push @dpos, \@diff_data; push @dpos, [ $1, do { $s2 =~ m{ ($differences) }xmsg && $1, $-[1] } ] ; } print qq{@$_ \n} for @dpos;
Output:
diff at offset 1, length 1 diff at offset 4, length 1 diff at offset 8, length 3 C G 1 G - 4 TAT C-C 8
See @- and @+ in perlvar, also Bitwise Or and Exclusive Or and Bitwise And in perlop.
BrowserUk is very good on this general topic.
Update: Added better code example, doc links. And thanks to ELISHEVA.
Update: Fixed @- link above. What was I thinking?
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Fast Identification Of String Difference
by johngg (Canon) on Jan 17, 2011 at 10:52 UTC |
In Section
Seekers of Perl Wisdom