Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Print A Sequence with Start codon and different Stop Codon

by choroba (Cardinal)
on Oct 27, 2015 at 23:18 UTC ( [id://1146194]=note: print w/replies, xml ) Need Help??


in reply to Print A Sequence with Start codon and different Stop Codon

It's not clear what output you expect. To search for overlapping sequences, you can change the second group from non-grouping to a look-behind:
$sequence =~ /(ATG.*?(?<=TAA|TAG|TGA))/g

to get

ATGGTTTCTCCCATCTCTCCATCGGCATAA ATGA

It still extracts the shortest possible sequence for each starting point (so we lost the second output).

Update: It's possible to get all the sequences without experimental regex features and depending on the return value of print like here.

my @from; my $pos = -1; push @from, $pos while -1 != ($pos = index $sequence, 'ATG', $pos + 1) +; my @to; for my $end (qw( TAA TAG TGA )) { $pos = -1; push @to, $pos + 3 while -1 != ($pos = index $sequence, $end, $pos + + 1); } for my $f (@from) { for my $t (@to) { say substr $sequence, $f, $t - $f if $t > $f; } } __END__ Output: ATGGTTTCTCCCATCTCTCCATCGGCATAA ATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAA ATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGA ATGATCTAA ATGA
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Replies are listed 'Best First'.
Re^2: Print A Sequence with Start codon and different Stop Codon
by Anonymous Monk on Oct 27, 2015 at 23:52 UTC

    Yes, but it was so much fun :)

    #!/usr/bin/perl -l # http://perlmonks.org/?node_id=1146191 use strict; use warnings; my $sequence = 'AATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAACGAA' +; while( $sequence =~ /ATG/g ) { my $rest = $'; print 'ATG' . $` . $1 while $rest =~ /(TAG|TAA|TGA)/g; }

      I tried that, but my output should be set of sequences with start codon ATG and end codon TAG,TAA,TGA. For example ATG...............TAA ATG...........TAG ATG.........................TGA ATG.................................TAA .......represents sequence in middle of start and stop codon

Re^2: Print A Sequence with Start codon and different Stop Codon
by PerlKc (Novice) on Oct 28, 2015 at 01:46 UTC

    I tried that, but my output should be set of sequences with start codon ATG and end codon TAG,TAA,TGA. For example ATG...............TAA ATG...........TAG ATG.........................TGA ATG.................................TAA .......represents sequence in middle of start and stop codon I am looking for regex features to get the output. Thanks

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1146194]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others goofing around in the Monastery: (3)
As of 2024-04-24 02:49 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found