http://www.perlmonks.org?node_id=1094847


in reply to Re^5: Filter and writing error log file
in thread Filter and writing error log file

Thanks Choroba for the suggestions.

I am checking a rule against a string of sequence (19 letter longs) in a while loop and formed this filter

So basically, I want to match A at position 3, T at position 10, [ACT] at position 13, [AT] at position 19 and atleast 3 A's or 3 T's from position 15-19 and the $gcper should be in between 30-52.

if (($seq =~/\w{2}A\w{6}T\w{2}[ACT]\w[GCA{4}T{4}]{4}[AT]/) && ($gcper >= 30 && $gcper <= 52)) { print "$seq\t$seqpos\t$gcper\n"; }

I checked the result and it seemed to work, i need help in wheather this code writing is ok or i can improve it better.

Another thing which I want to check is: no GC stretch more than 9 letters long, but I don't know how I can insert that check in the above code.

GCAGGTGGATCTATTTCAT 3201-3220 42.11 TAAGAGGTGTTATTTGGAA 3268-3287 31.58 ATACGATGCTTCAAGAGAA 3346-3365 36.84 CAAGCTCATCATACTGGCT 1201-1220 47.37 GGTACTGACTTTGCTTGCT 2923-2942 47.37 CGTAGTGTTAAGTTATAGT 3003-3022 31.58 GTATGGGTAGGGTAAATCA 3248-3267 42.11 CCTGCTGTGATACGATGCT 3337-3356 52.63 CCTGCGCGCGCGCGATGCT 3300-3318 50.63

Replies are listed 'Best First'.
Re^7: Filter and writing error log file
by choroba (Cardinal) on Jul 24, 2014 at 13:16 UTC
    You regexp seems incorrect. [GCA{4}T{4}] is a character class that matches any character of the ones listed, i.e. it's the same as [}4{GCAT]. Also, to make sure you match at a particular position, you should anchor your regex by starting it with ^ to match the beginning of the string.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      Thanks Choroba for your reply.

      So if I understood correctly, I should modify something like this:

      if( ($seq =~/^\w{2}A\w{6}T\w{2}[ACT]\w[^A{4}|^T{4}][GC]{4}[AT]/) && ( $seq !~ /[GC]{9}/) && ( $gcper >= 30 && $gcper <= 52) ) { Do something....;}

      Thanks again

        No. [^A{4}|^T{4}] is equivalent to [^^AT}|{4], i.e. it matches anything except ^, A, T, verbar, curlies and four.
        لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re^7: Filter and writing error log file
by newtoperlprog (Sexton) on Jul 24, 2014 at 12:46 UTC

    Dear All, I tried to incorporate the condition: no GC stretch more than 9 letters long, to the below code

    if(($seq =~/\w{2}A\w{6}T\w{2}[ACT]\w[GCA{4}T{4}]{4}[AT]/) && ( $seq !~ /[GC]{9}/) && ( $gcper >= 30 && $gcper <= 52)) { Do something....;}

    I was hoping to get some help regarding the regular expression and better writing of this code.

    Thank you