Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

Why doesn't quantifier work with character classes?

by rnaeye (Pilgrim)
on Aug 13, 2012 at 22:10 UTC ( #987226=perlquestion: print w/ replies, xml ) Need Help??
rnaeye has asked for the wisdom of the Perl Monks concerning the following question:

Hi! Monks,
I am trying to select lines that contain certain length of DNA string. Following code will print any DNA string that is 8 nucleotide or longer. However, I want to print DNA strings that are exactly 8 nucleotide long, such as "ATGATGAC". I thought {8} will match exactly 8 characters, but looks like I am wrong! I also tried ATGC{8,8}; did not work either.
In addition, in a separate program, I want to select DNA strings that are between 8-21 nucleotide long. Can you please give me any suggestions?
Thank you.
PS. I was able to solve this problem using "length" function without using any regex, but I would like to learn the regex solution to this problem.

#!/usr/bin/perl use warnings; use strict; while (<DATA>){ my $string = $_; if ($string =~ /[ATGC]{8}/) { print "$string"; } } __DATA__ @3009W:27:32 GCTCT + %.8:9 @3009W:27:40 TTGGG + 0(*2+ @3009W:31:26 AGCCT + 5<=46 @3009W:31:35 TCAGAAAACTG + 0.5*.--%-0- @3009W:32:34 GGGCCTAACCTGGGAGCCCCT + A@.:158+,--*-%-**--%- @3009W:34:32 CCATCATCTGGGG + :-:>>;;55755& @3009W:36:21 GACTT + (8.7( @3009W:40:24 ATGATCC + 44.0,.% @3009W:42:22 GCTTCCAGGGTCAGTTTGGGAAAC + :@>4;4888)1//**-%+5+25,. @3009W:47:23 GAGCATCGA + %*1.0...- @3009W:49:23 GAGTTCCATCGAAATGTACAAGCTTTACGTTTAAAAC + /3....0304036-22.,--(*.09*00,11),00(. @3009W:14:90 AGCAA + 82528 @3009W:17:84 GAAACACAC + 05?4=:<:0 @3009W:17:95 TTTTTCTTT + ;<<<-07<1 @3009W:19:89 CCTCTACC + ?:>>:;83 @3009W:19:90 AAGAA + :4<;2 @3009W:20:74 GGTTCC + 2&-.2. @3009W:22:94 CATTTGGAA + AAAB9>8>: @3009W:23:79 CTTACAA + @@9@@@@ @3009W:23:93 TCTTTTTC + @@@AAA/A @3009W:24:80 GTGAGC + <AAA@@ @3009W:25:79 AATAT + ?8=.0 @3009W:26:89 AGGCA + BB>BC @3009W:26:99 ATCCATAT + /88(3979 @3009W:27:83 AGGCA + AA>@@

Comment on Why doesn't quantifier work with character classes?
Download Code
Re: Why doesn't quantifier work with character classes?
by thezip (Vicar) on Aug 13, 2012 at 22:18 UTC

    Try:

    ... if ($string =~ /^[ATGC]{8}$/) { ...

    Note the caret and dollar-sign. This should match sequences that are exactly eight characters long.


    What can be asserted without proof can be dismissed without proof. - Christopher Hitchens, 1949-2011
Re: Why doesn't quantifier work with character classes?
by BrowserUk (Pope) on Aug 13, 2012 at 22:21 UTC
    I thought {8} will match exactly 8 characters, but looks like I am wrong!

    You aren't wrong, it does match exactly 8 characters ... but if those 8 characters are at the start of a line containing more than 8 characters, it still matches exactly the first 8.

    You need to anchor your regex: Ie. /^[ATGC]{8}$/. Now it will only match lines that contain exactly 8 characters.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      Please correct me if I am wrong (or if I misunderstood your statement), but there is nothing in OP's regex that restricts a match to the beginning of a line. Hence, the eight consecutive characters may appear anywhere on a line. Right?


      What can be asserted without proof can be dismissed without proof. - Christopher Hitchens, 1949-2011

        Correct!
        "BrowserUk" and "thezip" 's suggestions solved the problem. Thanks.

        Hence, the eight consecutive characters may appear anywhere on a line. Right?

        Yes, but since the lines that match the character class in question contain only those characters; and a regex will always match as early as possible; that'll be at the beginning of the line in these cases.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://987226]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (12)
As of 2014-09-18 18:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How do you remember the number of days in each month?











    Results (120 votes), past polls