Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

Re: Bioinformatics: Regex loop, no output

by tonto (Friar)
on Nov 15, 2015 at 18:39 UTC ( [id://1147742]=note: print w/replies, xml ) Need Help??


in reply to Bioinformatics: Regex loop, no output

Maybe like this?

use strict; use warnings; use Data::Dumper; my @proteins=qw( DAAAAATTLTTTAMTTTTTTCKMMFRPPPPPGGGGGGGGGGGG ALTAMCMNVWEITYHKGSDVNRRASFAQPPPQPPPPLLAIKPASDASD ); print Dumper \@proteins; my @new_peptides; for my $protein (@proteins) { if ($protein =~ m/[K(?!P)|R(?!P)]/g) { $protein =~ s/K(?!P)|R(?!P)/=/g; push @new_peptides, split ('=',$protein); } } print Dumper \@new_peptides; for (@new_peptides) { print "The peptide is $_\n"; }

I use Data::Dumper all the time because I've been fooled by my data too many times. ;-)

Replies are listed 'Best First'.
Re^2: Bioinformatics: Regex loop, no output
by AnomalousMonk (Archbishop) on Nov 16, 2015 at 22:32 UTC

    The output from your code shows some problems:

    The peptide is DAAAAATTLTTTAMTTTTTTC The peptide is MMFRPPPPPGGGGGGGGGGGG The peptide is ALTAMCMNVWEITYH The peptide is GSDVN The peptide is The peptide is ASFAQPPPQPPPPLLAIKPASDASD
    The K or R terminating split codon (if that's the proper term) is being incorrectly removed from the output peptides. (At least, I think this is incorrect. TamaDP doesn't show desired output, but seems satisfied with output examples given in various replies in this thread that include these codons.) So I assume  GSDVN should really be  GSDVNR and the "null" sequence following it should really be the single-codon sequence R. This is all down to the incorrect definition of the  s/// match pattern; take a look at some other replies in this thread for what I feel are more correct  s/// patterns.

    In an unrelated note, the regex in the condition expression of the
        if ($protein =~ m/[K(?!P)|R(?!P)]/g) { ... }
    block isn't doing what I think you think it's doing. The  [K(?!P)|R(?!P)] character class is exactly equivalent to the  [KPR()?!|] class; metacharacters (alternations, groupings, etc.) have no meaning in a character class, so  ()?!| are just literal characters (and repeated characters have no effect whatsoever). Also, the  /g modifier in the  m//g match is useless in the boolean context of a conditional, although it does no harm (except to burn a few more innocent computrons). Again, all this doesn't affect the basic problem with the code, which stems from the incorrect  s/// match.

    I use Data::Dumper all the time because I've been fooled by my data too many times.

    Yea and amen brother, yea and amen.


    Give a man a fish:  <%-{-{-{-<

      Thank you! I wondered if I understood what was wanted, later posts show that I didn't. I shouldn't have posted that, I'll stop myself next time.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1147742]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others imbibing at the Monastery: (5)
As of 2025-06-14 22:33 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.