Create the reverse complement DNA sequence without pattern matching and reverse built-in function?

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Create the reverse complement DNA sequence without pattern matching and reverse built-in function? by davido (Cardinal) on Feb 19, 2014 at 15:05 UTC
Sure there is. You could even reimplement an entire regular expression engine using substr, though I wouldn't recommend it. However, you're not using a regular expression here, you're using transliteration: `tr///`. `my $seq="ACGGGAGGACGGGAAAATTACTACGGCATTAGC"; my %xref = ( A => 'T', C => 'G', G => 'C', T => 'A', a => 't', c => 'g', g => 'c', t => 'a', ); for(my $ix = 0; $ix < length $seq; ++$ix ) { substr( $seq, $ix, 1, $xref{ substr( $seq, $ix, 1 ) } ); } my $reverse = reverse($seq); print $reverse, "\n";` [download] Proof you can write "almost C" in Perl. I would expect this to be tremendously slower than using `tr///` on large data sets. Dave	[reply] [d/l] [select]
Re^2: Create the reverse complement DNA sequence without pattern matching and reverse built-in function? by atcroft (Abbot) on Feb 19, 2014 at 15:13 UTC
Another similar approach, for illustration purposes: `perl -Mstrict -Mwarnings -le 'my $seq=q{ACGGGAGGACGGGAAAATTACTACGGCATTAGC}; my $complement; my %rc = ( A => q{T}, T => q{A}, C => q{G}, G => q{C}, ); my @letters = split //, $seq; while ( @letters ) { my $l = uc pop @letters; $complement .= $rc{$l}; } print $seq; print $complement;'` [download] Hope that helps.	[reply] [d/l]
Re: Create the reverse complement DNA sequence without pattern matching and reverse built-in function? by BrowserUk (Patriarch) on Feb 19, 2014 at 16:12 UTC
Is there a way of doing this ..., using only substr function? As has been shown, you can. But why would you? tr ISN'T a regular expression. It is a compile time constructed table lookup; designed for exactly this purpose; and it will do the job anywhere from ~~100x to 400~~ 78x & 270 times faster than the other methods offered here: #! perl -slw use strict; use Benchmark qw[ cmpthese ]; our $seq = "ACGGGAGGACGGGAAAATTACTACGGCATTAGCacgggaggacgggaaaattactacg +gcattagc"; our %xref = ( A => 'T', C => 'G', G => 'C', T => 'A', a => 't', c => 'g', g => 'c', t => 'a', ); our %rc = ( A => q{T}, T => q{A}, C => q{G}, G => q{C}, ); cmpthese -1, { tr => q[ ( my $revcmp = reverse $seq ) =~tr[ACGTacgt][tgcaTGCA]; ], davido => q[ my $seq = $seq; for(my $ix = 0; $ix < length $seq; ++$ix ) { substr( $seq, $ix, 1, $xref{ substr( $seq, $ix, 1 ) } ); } my $reverse = reverse($seq); ], atcroft => q[ my $complement; my @letters = split //, $seq; while ( @letters ) { my $l = uc pop @letters; $complement .= $rc{$l}; } ], hdb => q[ my $rev=''; my $n = length $seq; while( $n-- ){ $rev .= $_ for map chr( $_ & 2 ? $_^4: $_^21 ), ord substr + $seq, $n, 1 } ], } __END__ C:\test>junk60 Rate hdb atcroft davido tr hdb 9827/s -- -35% -71% -100% atcroft 15077/s 53% -- -55% -99% davido 33822/s 244% 124% -- -99% tr 2679052/s 27163% 17669% 7821% -- [download] (I know you guys know this; but does the OP??? And if they do, why are they asking for this?) With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^2: Create the reverse complement DNA sequence without pattern matching and reverse built-in function? by davido (Cardinal) on Feb 19, 2014 at 17:12 UTC
I tried to alert the OP to this issue in my post: I would expect this to be tremendously slower than using `tr///` on large data sets. And your benchmark confirms it; there's a tremendous difference, in favor of using `tr///` to do what it was designed to do. I think your tr/// method missed reversing the string afterwards, but that won't change the fact that the tr/// approach is the way to go. Dave	[reply] [d/l] [select]
Re^3: Create the reverse complement DNA sequence without pattern matching and reverse built-in function? by BrowserUk (Patriarch) on Feb 19, 2014 at 19:07 UTC
I tried to alert the OP to this issue in my post: Sorry. I missed that. your tr/// method missed reversing the string afterwards, but that won't change the fact that the tr/// approach is the way to go. Right on both counts. And it make's more of a difference than I was expecting, but still 80x faster than the next best: `tr => q[ ( my $revcmp = reverse $seq ) =~ tr[ACGTacgt][tgcaTGCA]; ], C:\test>junk60 Rate hdb atcroft davido tr hdb 9827/s -- -36% -71% -100% atcroft 15292/s 56% -- -55% -99% davido 33822/s 244% 121% -- -99% tr 2640400/s 26770% 17167% 7707% --` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re: Create the reverse complement DNA sequence without pattern matching and reverse built-in function? by kcott (Archbishop) on Feb 19, 2014 at 16:35 UTC
TMTOWTDI: `#!/usr/bin/env perl -l use strict; use warnings; my %comp = qw{A T C G G C T A a t c g g c t a}; my $seq = 'ACGTacgt'; print 'Initial: ', $seq; print 'Complement: ', @comp{split '' => $seq}; print 'Reverse Complement: ', @comp{reverse split '' => $seq};` [download] Output: `Initial: ACGTacgt Complement: TGCAtgca Reverse Complement: acgtACGT` [download] Update: For the OP: I hope, from other posts in this thread, that it's clear there's no benefit in not using the appropriate tools provided with the language, i.e. tr and reverse. However, if you do have some reason for writing your code this way, I notice that I did, in fact, use reverse. So, in keeping with the hash slice solution I used, you can get the same output by changing the last two lines to: `my @n = split '' => $seq; print 'Complement: ', @comp{@n}; print 'Reverse Complement: ', @comp{map { $n[$#n - $_] } 0 .. $#n};` [download] -- Ken	[reply] [d/l] [select]
Re: Create the reverse complement DNA sequence without pattern matching and reverse built-in function? by hdb (Monsignor) on Feb 19, 2014 at 15:51 UTC
Not recommended for production: `use strict; use warnings; my $seq='ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; my $rev=''; my $n = length $seq; while( $n-- ){ $rev .= $_ for map chr($_&2?$_^4:$ +_^21), ord substr $seq, $n, 1 } print "$seq\n$rev\n";` [download] Something reminds me of Black Jack... Update: ...or: `use strict; use warnings; my $seq='ACGGGAGGACGGGAAAATTACTACGGCATTAGC'; my $rev=''; $rev = "$_$rev" for map chr($_&2?$_^4:$_^21), map ord, split//,$seq; print "$seq\n$rev\n";` [download]	[reply] [d/l] [select]
Re: Create the reverse complement (OT: meaning / use of "reverse complement") by ww (Archbishop) on Feb 19, 2014 at 16:55 UTC
Could some kind soul explain how "reverse complement" works? Is is not the same as "reverse? -- ie, is NOT one word or the other redundant? Update: Has been explained! See below and the replies from BrowserUK and kcott And if that's a bit fuzzy around the edges (yeah, it is, IMO), code to illustrate why I'm puzzled with OP's use of the phrase "reverse complement" and its adoption by others: #!/usr/bin/perl use 5.016; use warnings; # 1075463 my $seq="ACGGGAGGACGGGAAAATTACTACGGCATTAGC"; say "Initial \$seq is: $seq"; say "Reverse sequence is: " . reverse($seq); my @seq = split(/\b/, $seq); my $comp; while (@seq) { $comp .= pop(@seq); } print "Complement is: $comp\n"; my $reverse = reverse($comp); print "Reverse complement is:$reverse\n"; =head OUTPUT: Initial $seq is: ACGGGAGGACGGGAAAATTACTACGGCATTAGC Reverse sequence is: CGATTACGGCATCATTAAAAGGGCAGGAGGGCA Complement is: ACGGGAGGACGGGAAAATTACTACGGCATTAGC # per OP's use, + I think Reverse complement is:CGATTACGGCATCATTAAAAGGGCAGGAGGGCA =cut [download] On-line dictionaries tend to mention the mathematics use only in geometric terms. Is my understanding that the OP's phrase would be better written and less ambiguous using just one of its two terms? Update: /me headslaps! Duh! Ignored the "double helix" fundamental! Thanks, obeisances and + +s to both BrowserUK and kcott. Come, let us reason together: Spirit of the Monastery	[reply] [d/l]
Re^2: Create the reverse complement (OT: meaning / use of "reverse complement") by kcott (Archbishop) on Feb 19, 2014 at 17:48 UTC
I'll attempt to describe this without using any biological references (therefore, highly oversimplified). DNA is made up of long strands (sequences) of just four building blocks (given the initials A, C, G and T). Here's an example sequence: `A \| C \| G \| T \| G` [download] The sequence here is `ACGTG`; the reverse sequence would be that read from the other end, i.e. `GTGCA`. DNA strands form pairs. Each of the building blocks pairs with only one of the other building blocks: A with T and C with G (and vice versa). Knowing the sequence of one strand, the complementary sequence is easily determined: `A==T \| \| C==G \| \| G==C \| \| T==A \| \| G==C` [download] So, the complement of `ACGTG` is `TGCAC`; and the reverse complement is just that read from the other end, i.e. `CACGT`. As I said, that's highly oversimplified but hopefully is enough to explain the terms. -- Ken	[reply] [d/l] [select]
Re^2: Create the reverse complement (OT: meaning / use of "reverse complement") by BrowserUk (Patriarch) on Feb 19, 2014 at 17:02 UTC
Google shows that is a well established term in genomics. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply]


Pathologically Eclectic Rubbish Lister
	PerlMonks