Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hello fellow monks!
I have a project in which I need to read some DNA sequences and check if they start with 'ATG' and end in 'TAA', 'TAG' or 'TGA'. If not, I am supposed to find the desired substring within the original sequence.
Imagine the following string:
For finding the correct start position, I suppose I could do:
What could I do to find the correct ending of the string, i.e., to end in either 'TAA', 'TAG' or 'TGA'? This part is troubling me, since there could me more than one correct codes as endings...
I have a project in which I need to read some DNA sequences and check if they start with 'ATG' and end in 'TAA', 'TAG' or 'TGA'. If not, I am supposed to find the desired substring within the original sequence.
Imagine the following string:
where 'ATG' and 'TAA' are somewhere within it but now in the start and end positions, as they should.GTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCCAGGCAAGGGCAGGTAG +CGACCGTACTTTCCGCCCCCGCGAAAATTACCAACCATCTGGTGGCGATGATTGAAAAAACTATCGGCG +GTCAGGATGCTTTGCCGAATATCAGCGATGCCGAACGTATTTTTTCTGACCTGCTCGCAGGACTTGCCA +GCGCGCAGCCGGGATTCCCGCTTGCACGGTTGAAAATGGTTGTCGAACAAGAATTCGCTCAGATCAAAC +ATGTTCTGCATGGTATCAGCCTGCTGGGTCAGTGCCCGGATAGCATCAACGCCGCGCTGATTTGCCGTG +GCGAAAAAATGTCGATCGCGATTATGGCGGGACTCCTGGAGGCGCGTGGACATCGCGTCACGGTGATTG +ATCCGGTAGAAAAATTGCTGGCGGTGGGCCATTACCTTGAATCTACCGTCGATATCGCGGAATCGACTC +GCCGTATCGCCGCCAGCCAGATCCCAGCCGATCACATGATCCTGATGGCGGGCTTTACCGCCGGTAATG +AAAAGGGTGAACTGGTGGTGCTGGGCCGTAATGGTTCCGACTATTCCGCCGCCGTGCTGGCCGCCTGTT +TACGCGCTGACTGCTGTGAAATCTGGACTGACGTCGATGGCGTGTATACCTGTGACCCGCGTCAGGTGC +CGGACGCCAGGCTGCTGAAATCGATGTCCTACCAGGAAGCGATGGAACTCTCTTACTTCGGCGCCAAAG +TCCTTCACCCTCGCACCATTACGCCCATCGCCCAGTTCCAGATCCCCTGTCTGATTAAAAATACCGGTA +ATCCGCAGGCGCCAGGAACGCTGATCGGCGCGTCCAGCGACGATGATAACCTACCAGTTAAAGGGATCT +CTAACCTTAACAACATGGCGATGTTTAGCGTCTCCGGCCCGGGAATGAAAGGGATGATTGGGATGGCGG +CGCGTGTTTTCGCCGCCATGTCTCGCGCCGGGATCTCGGTGGTGCTCATTACCCAGTCCTCCTCTGAGT +ACAGCATCAGTTTCTGTGTGCCGCAGAGTGACTGCGCGCGTGCCCGCCGTGCGATGCAGGATGAGTTCT +ATCTGGAGCTGAAAGAGGGGCTGCTGGAGCCGCTGGCGGTTACGGAGCGGTTGGCGATTATCTCTGTTG +TCGGCGACGGTATGCGCACGCTACGCGGCATTTCAGCGAAATTCTTCGCCGCGCTGGCGCGGGCCAATA +TCAATATCGTGGCGATCGCTCAGGGATCTTCTGAGCGTTCCATTTCTGTGGTGGTGAATAACGACGATG +CCACCACCGGCGTGCGGGTAACGCACCAGATGCTGTTCAATACCGATCAGGTGATTGAAGTGTTTGTCA +TTGGCGTCGGCGGCGTCGGCGGCGCGCTACTGGAACAGCTTAAACGTCAGCAAACCTGGTTGAAGAACA +AGCACATCGATCTACGCGTGTGCGGCGTGGCGAACTCAAAGGCGTTGCTAACCAATGTGCATGGCCTGA +ATCTGGACAACTGGCAGGCGGAACTGGCGCAAGCGAACGCGCCGTTCAATCTGGGACGCTTAATTCGCC +TGGTGAAAGAATATCATCTACTCAATCCGGTGATTGTTGATTGCACCTCCAGTCAGGCGGTGGCCGACC +AGTATGCTGACTTCCTGCGTGAAGGATTCCATGTGGTGACGCCAAACAAGAAAGCGAACACCTCGTCGA +TGGACTACTACCATCAGCTACGTTTCGCCGCCGCGCAATCACGGCGCAAATTCTTGTATGACACCAACG +TCGGCGCCGGTTTGCCGGTAATCGAAAACCTGCAAAACCTGCTGAATGCGGGTGATGAACTGCAAAAAT +TTTCCGGCATTCTTTCCGGGTCGCTCTCTTTTATTTTCGGTAAACTGGAAGAGGGGATGAGTCTCTCAC +AGGCGACCGCCCTGGCGCGCGAGATGGGCTATACCGAACCCGATCCGCGCGACGATCTTTCCGGTATGG +ATGTGGCGCGGAAACTGTTGATCCTCGCCCGCGAGACGGGCCGCGAGCTGGAGCTTTCCGATATTGTGA +TTGAACCGGTGTTGCCGGACGAGTTTGACGCCTCCGGCGATGTGACCACCTTTATGGCGCATCTGCCGC +AGCTTGACGACGCGTTTGCCGCCCGTGTGGCGAAAGCTCGTGATGAAGGTAAGGTATTGCGCTATGTGG +GCAATATCGAAGAGGATGGCGTGTGCCGCGTGAAGATTGCCGAAGTTGATGGTAACGATCCGCTCTTCA +AAGTGAAAAACGGTTAAGAAAACGCGCTGGCGTTCTACAGCCACTATTATCAGCCCTTGCCGTTGGTGC +TGCGCGGCTACGGCGCAGGCAATGATGTGACGGCGGCGGGCGTGTTTGCCGATCTGTTACGGACCCTCT +CATGGAAGTTAGGAGTT
For finding the correct start position, I suppose I could do:
$seq='GTGGCAAATGCAGAACGTTTTCTGCGTGTTGCCGATATTCTGGAAAGCAATGCCAGGCAAGGGC +AGGTAGCGACCGTACTTTCCGCCCCCGCGAAAATTACCAACCATCTGGTGGCGATGATTGAAAAAACTA +TCGGCGGTCAGGATGCTTTGCCGAATATCAGCGATGCCGAACGTATTTTTTCTGACCTGCTCGCAGGAC +TTGCCAGCGCGCAGCCGGGATTCCCGCTTGCACGGTTGAAAATGGTTGTCGAACAAGAATTCGCTCAGA +TCAAACATGTTCTGCATGGTATCAGCCTGCTGGGTCAGTGCCCGGATAGCATCAACGCCGCGCTGATTT +GCCGTGGCGAAAAAATGTCGATCGCGATTATGGCGGGACTCCTGGAGGCGCGTGGACATCGCGTCACGG +TGATTGATCCGGTAGAAAAATTGCTGGCGGTGGGCCATTACCTTGAATCTACCGTCGATATCGCGGAAT +CGACTCGCCGTATCGCCGCCAGCCAGATCCCAGCCGATCACATGATCCTGATGGCGGGCTTTACCGCCG +GTAATGAAAAGGGTGAACTGGTGGTGCTGGGCCGTAATGGTTCCGACTATTCCGCCGCCGTGCTGGCCG +CCTGTTTACGCGCTGACTGCTGTGAAATCTGGACTGACGTCGATGGCGTGTATACCTGTGACCCGCGTC +AGGTGCCGGACGCCAGGCTGCTGAAATCGATGTCCTACCAGGAAGCGATGGAACTCTCTTACTTCGGCG +CCAAAGTCCTTCACCCTCGCACCATTACGCCCATCGCCCAGTTCCAGATCCCCTGTCTGATTAAAAATA +CCGGTAATCCGCAGGCGCCAGGAACGCTGATCGGCGCGTCCAGCGACGATGATAACCTACCAGTTAAAG +GGATCTCTAACCTTAACAACATGGCGATGTTTAGCGTCTCCGGCCCGGGAATGAAAGGGATGATTGGGA +TGGCGGCGCGTGTTTTCGCCGCCATGTCTCGCGCCGGGATCTCGGTGGTGCTCATTACCCAGTCCTCCT +CTGAGTACAGCATCAGTTTCTGTGTGCCGCAGAGTGACTGCGCGCGTGCCCGCCGTGCGATGCAGGATG +AGTTCTATCTGGAGCTGAAAGAGGGGCTGCTGGAGCCGCTGGCGGTTACGGAGCGGTTGGCGATTATCT +CTGTTGTCGGCGACGGTATGCGCACGCTACGCGGCATTTCAGCGAAATTCTTCGCCGCGCTGGCGCGGG +CCAATATCAATATCGTGGCGATCGCTCAGGGATCTTCTGAGCGTTCCATTTCTGTGGTGGTGAATAACG +ACGATGCCACCACCGGCGTGCGGGTAACGCACCAGATGCTGTTCAATACCGATCAGGTGATTGAAGTGT +TTGTCATTGGCGTCGGCGGCGTCGGCGGCGCGCTACTGGAACAGCTTAAACGTCAGCAAACCTGGTTGA +AGAACAAGCACATCGATCTACGCGTGTGCGGCGTGGCGAACTCAAAGGCGTTGCTAACCAATGTGCATG +GCCTGAATCTGGACAACTGGCAGGCGGAACTGGCGCAAGCGAACGCGCCGTTCAATCTGGGACGCTTAA +TTCGCCTGGTGAAAGAATATCATCTACTCAATCCGGTGATTGTTGATTGCACCTCCAGTCAGGCGGTGG +CCGACCAGTATGCTGACTTCCTGCGTGAAGGATTCCATGTGGTGACGCCAAACAAGAAAGCGAACACCT +CGTCGATGGACTACTACCATCAGCTACGTTTCGCCGCCGCGCAATCACGGCGCAAATTCTTGTATGACA +CCAACGTCGGCGCCGGTTTGCCGGTAATCGAAAACCTGCAAAACCTGCTGAATGCGGGTGATGAACTGC +AAAAATTTTCCGGCATTCTTTCCGGGTCGCTCTCTTTTATTTTCGGTAAACTGGAAGAGGGGATGAGTC +TCTCACAGGCGACCGCCCTGGCGCGCGAGATGGGCTATACCGAACCCGATCCGCGCGACGATCTTTCCG +GTATGGATGTGGCGCGGAAACTGTTGATCCTCGCCCGCGAGACGGGCCGCGAGCTGGAGCTTTCCGATA +TTGTGATTGAACCGGTGTTGCCGGACGAGTTTGACGCCTCCGGCGATGTGACCACCTTTATGGCGCATC +TGCCGCAGCTTGACGACGCGTTTGCCGCCCGTGTGGCGAAAGCTCGTGATGAAGGTAAGGTATTGCGCT +ATGTGGGCAATATCGAAGAGGATGGCGTGTGCCGCGTGAAGATTGCCGAAGTTGATGGTAACGATCCGC +TCTTCAAAGTGAAAAACGGTTAAGAAAACGCGCTGGCGTTCTACAGCCACTATTATCAGCCCTTGCCGT +TGGTGCTGCGCGGCTACGGCGCAGGCAATGATGTGACGGCGGCGGGCGTGTTTGCCGATCTGTTACGGA +CCCTCTCATGGAAGTTAGGAGTT'; if($seq=~/.*(ATG.*)/) {$substring_with_correct_start=$1;}
What could I do to find the correct ending of the string, i.e., to end in either 'TAA', 'TAG' or 'TGA'? This part is troubling me, since there could me more than one correct codes as endings...
Back to
Seekers of Perl Wisdom