Re: More efficient way for this pattern match?

No, the split function is not a good way to do this trick. A regular expression is a better bet:

#!/usr/bin/perl
use strict;
use warnings;
use File::Copy; 
 
my $str1='MCCAALAPPMAATVGPESIWLWIGTIGMTLGTLYFVGRGRGVRDRKMQEFYIITIFITTI
+AAAMYFAMATGFGVT-------------EVMVG----DE---ALTIYWARYADWLFTTPLLLLDLSLLA
+GANRN----TIATLIG-LDVFMIG---T---GAIAALSST-PGTRIAWWAIST--GALL--ALLYVLVG
+TLSENARNRAPEVA--SLFGRLRNLVIALWFLYPVVWILGT---EGTFGILP--LYWETAAFMVLDLSA
+KVGFGVILLQSRSVLERVATPTAAPT';
my $str2='--OOOOOOOOOOOOOOOOMMMMMMMMMMMMMMMMMMMMMIIIIIIIIIIMMMMMMMMMMM
+MMMMMMMMMMOOOOO-------------OOOOO----OO---OOOOMMMMMMMMMMMMMMMMMMMMMII
+IIIII----MMMMMMM-MMMMMMM---M---MMMMMMOOO-OOOOMMMMMMMM--MMMM--MMMMMMMM
+MMIIIIIIIIIIII--IIIIMMMMMMMMMMMMMMMMMMMMO---OOO-OOOO--OOOMMMMMMMMMMMM
+MMMMMMMMMIIIIIIIIIIIII----';

while ($str2 =~ /(-+)/g) {
    my ($start, $end) = ($-[0], $+[0]);
    my $matchLen = $end - $start;
    
    next if substr($str1, $start, $matchLen) =~ /^-+$/;
    
    my $chIdx = $end == length($str2) ? $start - 1 : $end;
    
    substr ($str2, $start, $matchLen, substr($str2, $chIdx, 1) x $matc
+hLen);
}

print $str2;
[download]

Prints:

OOOOOOOOOOOOOOOOOOMMMMMMMMMMMMMMMMMMMMMIIIIIIIIIIMMMMMMMMMMMMMMMMMMMMM
+OOOOO-------------OOOOO----OO---OOOOMMMMMMMMMMMMMMMMMMMMMIIIIIII----M
+MMMMMM-MMMMMMM---M---MMMMMMOOO-OOOOMMMMMMMM--MMMM--MMMMMMMMMMIIIIIIII
+IIII--IIIIMMMMMMMMMMMMMMMMMMMMO---OOOOOOOO--OOOMMMMMMMMMMMMMMMMMMMMMI
+IIIIIIIIIIIIIIII
[download]

Perl is the programming world's equivalent of English

Comment on Re: More efficient way for this pattern match? Select or Download Code

Replies are listed 'Best First'.
Re^2: More efficient way for this pattern match? by ikegami (Patriarch) on Apr 14, 2015 at 03:00 UTC
Modifying `$str2` is resetting `pos($str2)`, so you're doing a lot of unneeded work. Adding `pos($str2) = $end;` at the end of the loop addresses this issue.	[reply] [d/l] [select]
Re^2: More efficient way for this pattern match? by hdb (Monsignor) on Apr 14, 2015 at 12:08 UTC
Your script seems to make some additional assumptions that I cannot find in the original question. For example, for `my $str1='--M--CCA'; my $str2='-----OOO';` [download] your script prints `OOOOOOOO` while I would have thought it should be `--O--OOO`? UPDATE: In order to avoid any confusion, the strings above are NOT part of the original question but examples I constructed assuming they could occur. The purpose of this was to highlight a situation where the proposed script would deliver something that violates the original requirements. Not sure the example is really relevant.	[reply] [d/l] [select]
Re^3: More efficient way for this pattern match? by GrandFather (Saint) on Apr 14, 2015 at 21:08 UTC
Good catch. It's a bug. The line: `next if substr($str1, $start, $matchLen) =~ /^-+$/;` [download] should be more like (untested): `next if ! matchLen \|\| substr($str1, $start, $matchLen) !~ /[^-]/;` [download] Perl is the programming world's equivalent of English	[reply] [d/l] [select]
Re^2: More efficient way for this pattern match? by Anonymous Monk on Apr 14, 2015 at 07:14 UTC
Thanks so much!	[reply]
Re^2: More efficient way for this pattern match? by Anonymous Monk on Apr 14, 2015 at 06:46 UTC
No, the split function is not a good way to do this trick. A regular expression is a better bet: FWIW, split function , same as match operator, both take a regular expression ;)	[reply]


good chemistry is complicated, and a little bit messy -LW
	PerlMonks