http://www.perlmonks.org?node_id=1043386

bingalee has asked for the wisdom of the Perl Monks concerning the following question:

I have a file and i need to find the reverse complement only of every 2nd line- basically of every even line. How do i go about it?

Here"s what the file looks like

@HWI-ST1023:184:C1V8LACXX:7:1101:1142:2247 2:N:0:TGACCA GTAGGGGCTGCGCGAACGCAAACCCCCGCTGCCACAAATGATCGTCGGACTGTAGAACTCTGAACGTGTA +GATCTCGGTGGCCGCCGTATCATTAAAAAAA + ?1=DBB@DCFFFFIGIIII6DGHHIII6@=AEEDDEEC;@C>@?(;;B;@B?9BCDAA3>(:@@CB+8(9 +>@:@CCBB289(259@B9B8?A:@C@>CC@B @HWI-ST1023:184:C1V8LACXX:7:1101:1450:2022 2:N:0:TGACCA ACGTGCCCTCGGCCAGAAGGCTTGGGGCGCAACTTGCGTTCAAAGACTCGATGGTTCACGGGATTCTGCA +ATTCACACCAAGTATCGCATTTCGCTACGTT + ?@@DDDFFADFFHIJIIFG>FHIJJJJJGIIBH=DHGHHDDFFF;AEAC?=>CD-:@CDBDBDBDD>CDD +D:ACDCDDDDD?(4>CBBD?@DDDDDDDD8? @HWI-ST1023:184:C1V8LACXX:7:1101:1457:2047 2:N:0:TGACCA GCGTCGCCAGCACAGAGGCCATGCGATCCGTCGAGTTATCATGAATCATCAGAGCAACGGGCAGAGCCCG +CGTCGACCTTTTATCTAATAAATGCGTCCCT + @CCDFFFFGHHHHJIIIJJIJJJJIIJJJJFHIBFBFHIGJJIGI@GHGGEHHHHHHFFDDABDDDDDDD +DDDDBDBBBDCCCCCDDDDCDDEECB8<@DD @HWI-ST1023:184:C1V8LACXX:7:1101:1476:2196 2:N:0:TGACCA GATTGGGGCTGCATTCCCAAACAACCCGACTCGTAGACAGCGCCTCGTGGTGCGACAGGGTCCGGGCACG +ACGGGGCTCTCACCCTCTCTGGCGCCCCTTT + C@CFFFFFHHHHHJJJJJJJJJJJJJJJIJJJJGHIIGEGGIJJIIICHH?@DDCDDBDD9CDBBDDDDD +BDDDDDDDDDCDDDD?BCDCCCDD@BDDDDD

Thanks in advance guys

EDIT

I'm facing specific problems in performing the reverse complement only on every other lines.

Actually I need to find reverse complement of every second line and JUST THE REVERSE of the fourth line

Replies are listed 'Best First'.
Re: find reverse complement of every 2nd line
by choroba (Cardinal) on Jul 09, 2013 at 23:49 UTC
    Looking at the sample, you want the reverse complement of the second line, plus four, plus four... You can use $. which keeps the line number and the modulus operator %:
    use feature 'say'; while (<>) { chomp; if ($. % 4 == 2 ) { $_ = reverse; y/ACTG/TGAC/; } say; }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
Re: find reverse complement of every 2nd line
by davido (Cardinal) on Jul 09, 2013 at 22:20 UTC

    To someone who isn't an expert in Bioinformatics, can you please tell me how you would compute the reverse complement in the first place? Explain that, and show what code you have so far, and I or someone else here will help you find a solution.


    Dave

      Hey Dave. For the reverse complement the code will be something like this

      my $raw=<IN>; $rev= reverse $raw; $rev=~tr/ATGCatgc/TACGtacg/;

      my problem actually is doing this only to every other line and not the entire file

      this is what i came up with, but it doesnt work

      while(@inf=<IN>) { if(i%2==1) { $inf[i]=reverse $inf[i]; } }

      In this i was just trying to find the reverse to see if this code performs the action on the lines I want it to.. but no luck. Im still thinking of any other way. Any suggestions?

        variables in Perl need sigils, so it must be $i not just i

        (using strict and warnings should have prevented you)

        and you need to increment it with ++$i within the loop.

        As an alternative you could just test $. % 2 since $. automatically holds the line-number.

        Cheers Rolf

        ( addicted to the Perl Programming Language)

        my problem actually is doing this only to every other line and not the entire file

        The special variable $. (documented in perldoc perlvar) keeps count of the line numbers, so you can do something like:
        while(<IN>) { next if $. % 2; # skip odd lines # do the processing }
        Cheers,
        Rob
Re: find reverse complement of every 2nd line
by rjt (Curate) on Jul 09, 2013 at 23:39 UTC

    This was discussed on SoPW recently. Here's one solution (mine! what a coincidence): Re: Reverse Complement.

    And a link to the entire thread: Reverse Complement.

    To process every other line, just check the input line number with $. when you read your file in a while loop, something like this:

    while (<DATA>) { chomp; for ($_) { say scalar reverse when $. == 4; say uc scalar reverse when $. % 2 == 0; default { say }; } }

    (Note the above does not calculate the reverse compliment; check the provided links for that. This is just meant to show you a straightforward way to process the lines you desire.)

    With a little adaptation, you should have no problem coming up with a solution to your problem.

Re: find reverse complement of every 2nd line
by BrowserUk (Patriarch) on Jul 09, 2013 at 22:42 UTC

    For the pure ACGTacgt lines it is simple:

    $in = 'GTAGGGGCTGCGCGAACGCAAACCCCCGCTGCCACAAATGATCGTCGGACTGTAGAACTCTGA +ACGTGTAGATCTCGGTGGCCGCCGTATCATTAAAAAAA';; ( $out = reverse $in ) =~ tr[acgtACGT][tgcaTGCA];; print $out;; TTTTTTTAATGATACGGCGGCCACCGAGATCTACACGTTCAGAGTTCTACAGTCCGACGATCATTTGTGG +CAGCGGGGGTTTGCGTTCGCGCAGCCCCTAC

    Extending that to deal with the other letters and characters is just a case of knowing the complement rules -- which you presumably do -- and then extending the tr/// conversion strings appropriately.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: find reverse complement of every 2nd line
by jeffa (Bishop) on Jul 09, 2013 at 22:44 UTC

    I love Perlmonks but Google is good for searching too, for example a search for "perl bioinformatics cpan reverse complement" yielded this page, which appears to "calculate the reverse complement of a strand of DNA" ... whatever that is. :)

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)