REGEX help

Mike98mm has asked for the wisdom of the Perl Monks concerning the following question:

I am new to perl.

I am trying to write a simple code to find an ORF (open reading frames)

I am trying to identify the ORF with start(ATG) and stop codons(TAA|TAG|TGA) and print out the ORF.

The nucleotide sequence:

TTCAGGTGTTTGCAACTGCGTTTTATTGCAAGAAAGAGTGGAGGGGTTTCCATGGGGCCCACCTCACAAC
+CCACTC
TTCACCCCCAAAATCACGCAGGGATCGGACTCAGGAAAGGGAAGCATCTGTGTGTTGCATACGAGCCCTT
+CCTGTACTTACTTCTTTCACAGCAGGGAAGG
AAGAGGGAAGAGGCAGCTGTGGAGAGGATCAGGTTGCGGGAGGTGGGTATCTCGCTGCTCTGACCTTACG
+TACAGTCCTCCACAGAAGCATCAAAGTGGACT
GGCACATATCGGCTCCCTTCACAGGCCACAATCATCTGTCTCTCCTTCGGGCTGGTCCGGTATCCAC
[download]

2018-05-02 Athanasius added code and paragraph tags

Comment on REGEX help
Download Code

Replies are listed 'Best First'.
Re: REGEX help by tybalt89 (Monsignor) on Apr 21, 2018 at 01:00 UTC
`#!/usr/bin/perl # http://perlmonks.org/?node_id=1213294 use strict; use warnings; $_ = 'TTCAGGTGTTTGCAACTGCGTTTTATTGCAAGAAAGAGTGGAGGGGTTTCCATGGGGCCCACCT +CACAACCCACTC TTCACCCCCAAAATCACGCAGGGATCGGACTCAGGAAAGGGAAGCATCTGTGTGTT +GCATACGAGCCCTTCCTGTACTTACTTCTTTCACAGCAGGGAAGG AAGAGGGAAGAGGCAGCTGTGGA +GAGGATCAGGTTGCGGGAGGTGGGTATCTCGCTGCTCTGACCTTACGTACAGTCCTCCACAGAAGCATC +AAAGTGGACT GGCACATATCGGCTCCCTTCACAGGCCACAATCATCTGTCTCTCCTTCGGGCTGGTCC +GGTATCCAC'; print "$_\n\n" for /ATG.*?(?:TAA\|TAG\|TGA)/g;` [download]	[reply] [d/l]
Re: REGEX help by dorko (Prior) on Apr 21, 2018 at 01:31 UTC
Hello. Welcome. I think you want to look at BioPerl. This code is completely untested. I didn't want to install BioPerl on my system. But it might work if you can get BioPerl installed. (I know you're new to Perl, so installing BioPerl might be a stretch, but this might help: http://bioperl.org/INSTALL.html.) #!/bin/perl use strict; use warnings; use Bio::Seq; use Data::Dumper::Simple; use feature "say"; # Convert the sequence to lower case. Upper Case might be ok, # but the docs for Bio::Seq used lower case, so let's go with that. my $letters = lc("TTCAGGTGTTTGCAACTGCGTTTTATTGCAAGAAAGAGTGGAGGGGTTTCCA +TGGGGCCCACCTCACAACCCACTC TTCACCCCCAAAATCACGCAGGGATCGGACTCAGGAAAGGGAAG +CATCTGTGTGTTGCATACGAGCCCTTCCTGTACTTACTTCTTTCACAGCAGGGAAGG AAGAGGGAAGA +GGCAGCTGTGGAGAGGATCAGGTTGCGGGAGGTGGGTATCTCGCTGCTCTGACCTTACGTACAGTCCTC +CACAGAAGCATCAAAGTGGACT GGCACATATCGGCTCCCTTCACAGGCCACAATCATCTGTCTCTCCT +TCGGGCTGGTCCGGTATCCAC"); #Create a sequence object. my $seq_object = Bio::Seq->new(-seq => $letters, -alphabet => 'dna' ); #Look for the ORF. I specified the start, but I didn't see how to #specify the stop. Are the stop codons universal? I'm way out of #my league here. $prot_object = $seq_object->translate( -orf => 1, -start => "atg" ); say Dumper $prot_object; [download] Cheers, Brent -- Yeah, I'm a Delt.	[reply] [d/l]
Re^2: REGEX help by karlgoethebier (Abbot) on Apr 21, 2018 at 16:21 UTC
One must be the volunteer. So i forced a `cpanm` install and only added a `my` you forgot. Unfortunately the script crashed the hard way: ------------- EXCEPTION ------------- MSG: Failed validation of sequence '[unidentified sequence]'. Invalid +characters were: STACK Bio::PrimarySeq::validate_seq /Users/karl/perl5/perlbrew/perls/p +erl-5.24.1threads/lib/site_perl/5.24.1/Bio/PrimarySeq.pm:338 STACK Bio::PrimarySeq::_set_seq_by_ref /Users/karl/perl5/perlbrew/perl +s/perl-5.24.1threads/lib/site_perl/5.24.1/Bio/PrimarySeq.pm:287 STACK Bio::PrimarySeq::seq /Users/karl/perl5/perlbrew/perls/perl-5.24. +1threads/lib/site_perl/5.24.1/Bio/PrimarySeq.pm:272 STACK Bio::PrimarySeq::new /Users/karl/perl5/perlbrew/perls/perl-5.24. +1threads/lib/site_perl/5.24.1/Bio/PrimarySeq.pm:229 STACK Bio::Seq::new /Users/karl/perl5/perlbrew/perls/perl-5.24.1thread +s/lib/site_perl/5.24.1/Bio/Seq.pm:496 STACK toplevel ./bio.pl:16 ------------------------------------- [download] Just for curiosity. Anyway - too bad. This stuff is PITA... Best regards, Karl ŤThe Crux of the Biscuit is the Apostropheť `perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'`Help	[reply] [d/l] [select]
Re^3: REGEX help by poj (Abbot) on Apr 21, 2018 at 17:17 UTC
Take the spaces out of `$letters` and you should get `$prot_object = bless( { 'primary_seq' => bless( { 'length' => 63, '_root_verbose' => 0 +, '_nowarnonempty' => +undef, 'seq' => 'MGPTSQPTLH +PQNHAGIGLRKGKHLCVAYEPFLYLLLSQQGRKREEAAVERIRLREVGISLL*', 'alphabet' => 'prote +in' }, 'Bio::PrimarySeq' ) +, '_root_verbose' => 0 }, 'Bio::Seq' );` [download] whatever that means ! poj	[reply] [d/l] [select]
Re^4: REGEX help by dorko (Prior) on Apr 21, 2018 at 19:31 UTC
Re^5: REGEX help by Your Mother (Archbishop) on Apr 21, 2018 at 20:04 UTC
Some notes below your chosen depth have not been shown here
Re^5: REGEX help by karlgoethebier (Abbot) on Apr 22, 2018 at 08:16 UTC
Re^4: REGEX help by karlgoethebier (Abbot) on Apr 21, 2018 at 19:22 UTC
Re: REGEX help by Cristoforo (Curate) on Apr 21, 2018 at 01:06 UTC
Hello Mike98mm You can search for your answer in this forum. Some possible things I found were one by myself, Re: Using Recursion to Find DNA Sequences, another, finding open reading frames and possibly the best solution for the problem (if the pairs between the stop and end frames must be triplets), Re: finding open reading frames. There were many hits on this search here.	[reply]
Re: REGEX help by LeBreton (Initiate) on Apr 21, 2018 at 15:19 UTC
Hi Mike, You can look at the High Order Perl book or the article "How Perl save the DNA Project" (or something like that. You'll find all what you want i think.	[reply]
Re^2: REGEX help by AnomalousMonk (Archbishop) on Apr 21, 2018 at 15:37 UTC
Free download of Dominus's "Higher-Order Perl" here. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]

Back to Seekers of Perl Wisdom