http://www.perlmonks.org?node_id=732796


in reply to Lower-casing Substrings and Iterating Two Files together

If you bitwise or (|) an uppercase letter with a space, (assuming latin-1/ASCII files), it will lowercase it:

print 'ACGT' | ' ';; acgt

So, if you translate all the 'N's in your mask to spaces and then bitwise or the sequence and the mask, it will achieve your goal very efficiently:

$s = 'GGTACACAGAAGCCAAAGCAGGCTCCAGGCTCTGAGCTGTCAGCACAGAGACCGAT';; $m = 'GGTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNT';; ( $mm = $m ) =~ tr[N][\x20];; print $mm;; GGT T print $s | $mm;; GGTacacagaagccaaagcaggctccaggctctgagctgtcagcacagagaccgaT

Which makes your entire program (excluding the unmentioned fact that your files may be in FASTA format):

#! perl -slw use strict; open SEQ, '<', 'data1.dat' or die $!; open MASK, '<', 'data2.dat' or die $!; while( my $seq = <SEQ> ) { ## Read a sequence my $mask = <MASK>; ## And the corresponding mask $mask =~ tr[N][ ]; ## Ns => spaces print $seq | $mask; ## bitwise-OR them and print the result } close SEQ; close MASK;

Redirect the output to a third file and you're done.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.