Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Re^4: Filter and writing error log file

by newtoperlprog (Sexton)
on Jul 23, 2014 at 13:44 UTC ( [id://1094789]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Filter and writing error log file
in thread Filter and writing error log file

Thanks for the suggestions. One question, why we have to use '^' to match rather than

[ATGC]

Replies are listed 'Best First'.
Re^5: Filter and writing error log file
by choroba (Cardinal) on Jul 23, 2014 at 13:52 UTC
    See perlre. The carret negates the class, so the regular expression matches non-ACTG characters, but I used !~ to negate that. It's like the difference between

    "The sequence doesn't contain invalid characters"

    and

    "The sequence contains valid characters"

    These two are not equivalent, as the second lacks the work "only".
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Thanks Choroba for the suggestions.

      I am checking a rule against a string of sequence (19 letter longs) in a while loop and formed this filter

      So basically, I want to match A at position 3, T at position 10, [ACT] at position 13, [AT] at position 19 and atleast 3 A's or 3 T's from position 15-19 and the $gcper should be in between 30-52.

      if (($seq =~/\w{2}A\w{6}T\w{2}[ACT]\w[GCA{4}T{4}]{4}[AT]/) && ($gcper >= 30 && $gcper <= 52)) { print "$seq\t$seqpos\t$gcper\n"; }

      I checked the result and it seemed to work, i need help in wheather this code writing is ok or i can improve it better.

      Another thing which I want to check is: no GC stretch more than 9 letters long, but I don't know how I can insert that check in the above code.

      GCAGGTGGATCTATTTCAT 3201-3220 42.11 TAAGAGGTGTTATTTGGAA 3268-3287 31.58 ATACGATGCTTCAAGAGAA 3346-3365 36.84 CAAGCTCATCATACTGGCT 1201-1220 47.37 GGTACTGACTTTGCTTGCT 2923-2942 47.37 CGTAGTGTTAAGTTATAGT 3003-3022 31.58 GTATGGGTAGGGTAAATCA 3248-3267 42.11 CCTGCTGTGATACGATGCT 3337-3356 52.63 CCTGCGCGCGCGCGATGCT 3300-3318 50.63
        You regexp seems incorrect. [GCA{4}T{4}] is a character class that matches any character of the ones listed, i.e. it's the same as [}4{GCAT]. Also, to make sure you match at a particular position, you should anchor your regex by starting it with ^ to match the beginning of the string.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Dear All, I tried to incorporate the condition: no GC stretch more than 9 letters long, to the below code

        if(($seq =~/\w{2}A\w{6}T\w{2}[ACT]\w[GCA{4}T{4}]{4}[AT]/) && ( $seq !~ /[GC]{9}/) && ( $gcper >= 30 && $gcper <= 52)) { Do something....;}

        I was hoping to get some help regarding the regular expression and better writing of this code.

        Thank you

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1094789]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chanting in the Monastery: (3)
As of 2024-04-24 18:52 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found