http://www.perlmonks.org?node_id=1213294

Mike98mm has asked for the wisdom of the Perl Monks concerning the following question:

I am new to perl.

I am trying to write a simple code to find an ORF (open reading frames)

I am trying to identify the ORF with start(ATG) and stop codons(TAA|TAG|TGA) and print out the ORF.

The nucleotide sequence:

TTCAGGTGTTTGCAACTGCGTTTTATTGCAAGAAAGAGTGGAGGGGTTTCCATGGGGCCCACCTCACAAC +CCACTC TTCACCCCCAAAATCACGCAGGGATCGGACTCAGGAAAGGGAAGCATCTGTGTGTTGCATACGAGCCCTT +CCTGTACTTACTTCTTTCACAGCAGGGAAGG AAGAGGGAAGAGGCAGCTGTGGAGAGGATCAGGTTGCGGGAGGTGGGTATCTCGCTGCTCTGACCTTACG +TACAGTCCTCCACAGAAGCATCAAAGTGGACT GGCACATATCGGCTCCCTTCACAGGCCACAATCATCTGTCTCTCCTTCGGGCTGGTCCGGTATCCAC

2018-05-02 Athanasius added code and paragraph tags

Replies are listed 'Best First'.
Re: REGEX help
by tybalt89 (Monsignor) on Apr 21, 2018 at 01:00 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1213294 use strict; use warnings; $_ = 'TTCAGGTGTTTGCAACTGCGTTTTATTGCAAGAAAGAGTGGAGGGGTTTCCATGGGGCCCACCT +CACAACCCACTC TTCACCCCCAAAATCACGCAGGGATCGGACTCAGGAAAGGGAAGCATCTGTGTGTT +GCATACGAGCCCTTCCTGTACTTACTTCTTTCACAGCAGGGAAGG AAGAGGGAAGAGGCAGCTGTGGA +GAGGATCAGGTTGCGGGAGGTGGGTATCTCGCTGCTCTGACCTTACGTACAGTCCTCCACAGAAGCATC +AAAGTGGACT GGCACATATCGGCTCCCTTCACAGGCCACAATCATCTGTCTCTCCTTCGGGCTGGTCC +GGTATCCAC'; print "$_\n\n" for /ATG.*?(?:TAA|TAG|TGA)/g;
Re: REGEX help
by dorko (Prior) on Apr 21, 2018 at 01:31 UTC
    Hello. Welcome.

    I think you want to look at BioPerl.

    This code is completely untested. I didn't want to install BioPerl on my system. But it might work if you can get BioPerl installed. (I know you're new to Perl, so installing BioPerl might be a stretch, but this might help: http://bioperl.org/INSTALL.html.)

    #!/bin/perl use strict; use warnings; use Bio::Seq; use Data::Dumper::Simple; use feature "say"; # Convert the sequence to lower case. Upper Case might be ok, # but the docs for Bio::Seq used lower case, so let's go with that. my $letters = lc("TTCAGGTGTTTGCAACTGCGTTTTATTGCAAGAAAGAGTGGAGGGGTTTCCA +TGGGGCCCACCTCACAACCCACTC TTCACCCCCAAAATCACGCAGGGATCGGACTCAGGAAAGGGAAG +CATCTGTGTGTTGCATACGAGCCCTTCCTGTACTTACTTCTTTCACAGCAGGGAAGG AAGAGGGAAGA +GGCAGCTGTGGAGAGGATCAGGTTGCGGGAGGTGGGTATCTCGCTGCTCTGACCTTACGTACAGTCCTC +CACAGAAGCATCAAAGTGGACT GGCACATATCGGCTCCCTTCACAGGCCACAATCATCTGTCTCTCCT +TCGGGCTGGTCCGGTATCCAC"); #Create a sequence object. my $seq_object = Bio::Seq->new(-seq => $letters, -alphabet => 'dna' ); #Look for the ORF. I specified the start, but I didn't see how to #specify the stop. Are the stop codons universal? I'm way out of #my league here. $prot_object = $seq_object->translate( -orf => 1, -start => "atg" ); say Dumper $prot_object;

    Cheers,

    Brent

    -- Yeah, I'm a Delt.

      One must be the volunteer. So i forced a cpanm install and only added a my you forgot. Unfortunately the script crashed the hard way:

      ------------- EXCEPTION ------------- MSG: Failed validation of sequence '[unidentified sequence]'. Invalid +characters were: STACK Bio::PrimarySeq::validate_seq /Users/karl/perl5/perlbrew/perls/p +erl-5.24.1threads/lib/site_perl/5.24.1/Bio/PrimarySeq.pm:338 STACK Bio::PrimarySeq::_set_seq_by_ref /Users/karl/perl5/perlbrew/perl +s/perl-5.24.1threads/lib/site_perl/5.24.1/Bio/PrimarySeq.pm:287 STACK Bio::PrimarySeq::seq /Users/karl/perl5/perlbrew/perls/perl-5.24. +1threads/lib/site_perl/5.24.1/Bio/PrimarySeq.pm:272 STACK Bio::PrimarySeq::new /Users/karl/perl5/perlbrew/perls/perl-5.24. +1threads/lib/site_perl/5.24.1/Bio/PrimarySeq.pm:229 STACK Bio::Seq::new /Users/karl/perl5/perlbrew/perls/perl-5.24.1thread +s/lib/site_perl/5.24.1/Bio/Seq.pm:496 STACK toplevel ./bio.pl:16 -------------------------------------

      Just for curiosity. Anyway - too bad. This stuff is PITA...

      Best regards, Karl

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

        Take the spaces out of $letters and you should get

        $prot_object = bless( { 'primary_seq' => bless( { 'length' => 63, '_root_verbose' => 0 +, '_nowarnonempty' => +undef, 'seq' => 'MGPTSQPTLH +PQNHAGIGLRKGKHLCVAYEPFLYLLLSQQGRKREEAAVERIRLREVGISLL*', 'alphabet' => 'prote +in' }, 'Bio::PrimarySeq' ) +, '_root_verbose' => 0 }, 'Bio::Seq' );

        whatever that means !

        poj
Re: REGEX help
by Cristoforo (Curate) on Apr 21, 2018 at 01:06 UTC
Re: REGEX help
by LeBreton (Initiate) on Apr 21, 2018 at 15:19 UTC
    Hi Mike, You can look at the High Order Perl book or the article "How Perl save the DNA Project" (or something like that. You'll find all what you want i think.

      Free download of Dominus's "Higher-Order Perl" here.


      Give a man a fish:  <%-{-{-{-<