Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Regular expressions

by kennethk (Abbot)
on Oct 26, 2015 at 19:42 UTC ( [id://1146029]=note: print w/replies, xml ) Need Help??


in reply to Regular expressions

First, I do not replicate your stated challenge. You say "that the $1 is unspecified", but when I get your posted code, I get:
GTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATC
which, while it is not correct, is not unspecified. Am I misunderstanding your statement, or are you seeing something different from your code? Make sure that your examples match up to the issues you are encountering.

If I run stevieb's solution, I get

GTTTCTCCCATCTCTCCATCGGCA ATC
which would seem to meet your spec. The bigger question is what happens for nested cases? What is your expected output for
my $seq = 'ATGATGTGATGA';
Also, I note a reference to codons, which implies that your tests should be considering a stride of 3 rather than an arbitrary position. Does this matter for your case?

#11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.

Replies are listed 'Best First'.
Re^2: Regular expressions
by Laurent_R (Canon) on Oct 26, 2015 at 20:06 UTC
    GTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATC
    which, while it is not correct, is not unspecified.
    Well, I understand your point, but is it really incorrect? After all this sequence is preceded by ATG and followed by TAA, so it is in a certain way correct. But it is clearly not the smallest sequence matching this criteria in the input string.

    This to say that, while:

    DB<2> print "$1\n" while ($seq =~ m/(ATG(?:.*?)(TAG|TAA|TGA))/g); ATGGTTTCTCCCATCTCTCCATCGGCATAA ATGATCTAA
    seems to probably give a better answer, it is not completely clear whether the
    (ATG)GTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATC(TAA)
    is a valid sequence or not.
Re^2: Regular expressions
by Athanasius (Archbishop) on Oct 27, 2015 at 06:27 UTC
    Also, I note a reference to codons, which implies that your tests should be considering a stride of 3 rather than an arbitrary position.

    This is an excellent point. For the benefit of the OP, here is one way to ensure that only codon-sequences are captured:

    #! perl use strict; use warnings; my $seq = 'AATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAACGAA'; # Adapted from the regex by stevieb my $re = qr{ ( # capture each sequence: ATG # - which begins with the codon ATG (?: [ACGT]{3} )*? # - followed by the smallest number of + codons (?: TAG | TAA | TGA ) # - and ending with the codon TAG, TAA +, or TGA ) }x; print "$1\n" while $seq =~ /$re/g;

    (This assumes that only minimal sequences are wanted — an assumption which should be clarified, as Laurent_R has pointed out, above.)

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      I would have organized the code slightly differently, factoring each of the pattern elements into a separate  qr// regex object and combining them together (inside a capture group) in the final  m// match:

      c:\@Work\Perl\monks>perl -wMstrict -le "my $codon = qr{ [ACGT]{3} }xms; my $start = qr{ ATG }xms; my $end = qr{ TAG | TAA | TGA }xms; ;; my $seq = 'AATGGTTTCTCCCATCTCTCCATCGGCATAAAAATACAGAATGATCTAACGAA'; ;; print qq{'$1'} while $seq =~ m{ ($start $codon*? $end) }xmsg; " 'ATGGTTTCTCCCATCTCTCCATCGGCATAA' 'ATGATCTAA'
      Separate  qr// definitions ease maintenance and, if variable names be wisely chosen, are self-commenting. If possible, I only use capture groups in the final  m// match due to the confusion that trying to count nested capture groups can produce.


      Give a man a fish:  <%-{-{-{-<

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1146029]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2024-04-25 07:14 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found