http://www.perlmonks.org?node_id=1147742


in reply to Bioinformatics: Regex loop, no output

Maybe like this?

use strict; use warnings; use Data::Dumper; my @proteins=qw( DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD ); print Dumper \@proteins; my @new_peptides; for my $protein (@proteins) { if ($protein =~ m/[K(?!P)|R(?!P)]/g) { $protein =~ s/K(?!P)|R(?!P)/=/g; push @new_peptides, split ('=',$protein); } } print Dumper \@new_peptides; for (@new_peptides) { print "The peptide is $_\n"; }

I use Data::Dumper all the time because I've been fooled by my data too many times. ;-)

Replies are listed 'Best First'.
Re^2: Bioinformatics: Regex loop, no output
by AnomalousMonk (Archbishop) on Nov 16, 2015 at 22:32 UTC

    The output from your code shows some problems:

    The peptide is DAAAAATTLTTTAMTTTTTTC The peptide is MMFRPPPPPGGGGGGGGGGGG The peptide is ALTAMCMNVWEITYH The peptide is GSDVN The peptide is The peptide is ASFAQPPPQPPPPLLAIKPASDASD
    The K or R terminating split codon (if that's the proper term) is being incorrectly removed from the output peptides. (At least, I think this is incorrect. TamaDP doesn't show desired output, but seems satisfied with output examples given in various replies in this thread that include these codons.) So I assume  GSDVN should really be  GSDVNR and the "null" sequence following it should really be the single-codon sequence R. This is all down to the incorrect definition of the  s/// match pattern; take a look at some other replies in this thread for what I feel are more correct  s/// patterns.

    In an unrelated note, the regex in the condition expression of the
        if ($protein =~ m/[K(?!P)|R(?!P)]/g) { ... }
    block isn't doing what I think you think it's doing. The  [K(?!P)|R(?!P)] character class is exactly equivalent to the  [KPR()?!|] class; metacharacters (alternations, groupings, etc.) have no meaning in a character class, so  ()?!| are just literal characters (and repeated characters have no effect whatsoever). Also, the  /g modifier in the  m//g match is useless in the boolean context of a conditional, although it does no harm (except to burn a few more innocent computrons). Again, all this doesn't affect the basic problem with the code, which stems from the incorrect  s/// match.

    I use Data::Dumper all the time because I've been fooled by my data too many times.

    Yea and amen brother, yea and amen.


    Give a man a fish:  <%-{-{-{-<

      Thank you! I wondered if I understood what was wanted, later posts show that I didn't. I shouldn't have posted that, I'll stop myself next time.