Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re^5: Filter and writing error log file

by choroba (Cardinal)
on Jul 23, 2014 at 13:52 UTC ( [id://1094791]=note: print w/replies, xml ) Need Help??


in reply to Re^4: Filter and writing error log file
in thread Filter and writing error log file

See perlre. The carret negates the class, so the regular expression matches non-ACTG characters, but I used !~ to negate that. It's like the difference between

"The sequence doesn't contain invalid characters"

and

"The sequence contains valid characters"

These two are not equivalent, as the second lacks the work "only".
لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

Replies are listed 'Best First'.
Re^6: Filter and writing error log file
by newtoperlprog (Sexton) on Jul 23, 2014 at 19:55 UTC

    Thanks Choroba for the suggestions.

    I am checking a rule against a string of sequence (19 letter longs) in a while loop and formed this filter

    So basically, I want to match A at position 3, T at position 10, [ACT] at position 13, [AT] at position 19 and atleast 3 A's or 3 T's from position 15-19 and the $gcper should be in between 30-52.

    if (($seq =~/\w{2}A\w{6}T\w{2}[ACT]\w[GCA{4}T{4}]{4}[AT]/) && ($gcper >= 30 && $gcper <= 52)) { print "$seq\t$seqpos\t$gcper\n"; }

    I checked the result and it seemed to work, i need help in wheather this code writing is ok or i can improve it better.

    Another thing which I want to check is: no GC stretch more than 9 letters long, but I don't know how I can insert that check in the above code.

    GCAGGTGGATCTATTTCAT 3201-3220 42.11 TAAGAGGTGTTATTTGGAA 3268-3287 31.58 ATACGATGCTTCAAGAGAA 3346-3365 36.84 CAAGCTCATCATACTGGCT 1201-1220 47.37 GGTACTGACTTTGCTTGCT 2923-2942 47.37 CGTAGTGTTAAGTTATAGT 3003-3022 31.58 GTATGGGTAGGGTAAATCA 3248-3267 42.11 CCTGCTGTGATACGATGCT 3337-3356 52.63 CCTGCGCGCGCGCGATGCT 3300-3318 50.63
      You regexp seems incorrect. [GCA{4}T{4}] is a character class that matches any character of the ones listed, i.e. it's the same as [}4{GCAT]. Also, to make sure you match at a particular position, you should anchor your regex by starting it with ^ to match the beginning of the string.
      لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

        Thanks Choroba for your reply.

        So if I understood correctly, I should modify something like this:

        if( ($seq =~/^\w{2}A\w{6}T\w{2}[ACT]\w[^A{4}|^T{4}][GC]{4}[AT]/) && ( $seq !~ /[GC]{9}/) && ( $gcper >= 30 && $gcper <= 52) ) { Do something....;}

        Thanks again

      Dear All, I tried to incorporate the condition: no GC stretch more than 9 letters long, to the below code

      if(($seq =~/\w{2}A\w{6}T\w{2}[ACT]\w[GCA{4}T{4}]{4}[AT]/) && ( $seq !~ /[GC]{9}/) && ( $gcper >= 30 && $gcper <= 52)) { Do something....;}

      I was hoping to get some help regarding the regular expression and better writing of this code.

      Thank you

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1094791]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others chilling in the Monastery: (5)
As of 2024-03-29 08:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found