http://www.perlmonks.org?node_id=991581

supriyoch_2008 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Perlmonks,

I am a beginner in perl programming. I am interested in drawing the secondary structure of mRNA from base sequence using a perl script i.e. drawrna.pl given in the book "Genomic Perl: From Bioinformatics Basics to Working Code" by Rex A. Dwyer page 300-306 (2003, Cambridge University Press). I have given the code below and I have just changed a little the section of the main program for input of two files i.e. t1.txt and t2.txt. The perl script runs in cmd with Active Perl 5.10.1.1007-MSWin32-x86-291969.msi and Windows7 Professional in my laptop. But after the input of two files, the cmd screen shows some results which are not at all understandable to me as I never had a course in computer. Perl is the first computer language I am learning. May I request perlmonks to suggest me what I should do to view the secondary structure in cmd and to get the structure in the text output file? For the structure shall I have to use other software? If so, which software(s)?

Here goes the drawrna.pl

#!/usr/bin/perl ################################################## ## Draw Folded Structure of mRNA: ### Using "Genomic Perl: From Bioinformatics Basics to Working Code" ### copyright (c) 2002 Cambridge University Press ################################################################ use strict; ####################### ## GLOBAL VARIABLES ####################### my @structure; ## list of dots and parens describing input structure. my @bases; ## list of bases from input string. ### Largest and smallest x- and y-coordinates used in the drawing; ### ultimately used to scale drawing onto one page. my ($minx, $maxx, $miny, $maxy); ######################################### sub max { ## RETURNS: the larger of its two arguments. ######################################### my ($x,$y) = @_; if ($x>$y) { return $x; } else { return $y}; } ######################################### sub begin_PostScriptPicture { ## issues PostScript commands needed at beginning of .ps file. ## RETURNS: nothing. ######################################### print "/rnaline { pt setlinewidth moveto lineto stroke} def\n"; print "/rnabase { moveto -0.12 -0.12 rmoveto show} def\n"; print "/rnapicture {\n\n"; } ######################################### sub end_PostScriptPicture { ## issues PostScript commands needed at end of .ps file. ## RETURNS: nothing. ######################################### print "\n} def\n\n"; my $scale = 0.95 * (8.5*72) / max($maxx-$minx, $maxy-$miny); my $xorigin = ((8.5*72) * -$minx / ($maxx-$minx)) || "0"; my $yorigin = ((8.5*72) * -$miny / ($maxy-$miny)) || "0"; print "/pt {$scale div} def\n\n"; print "/Helvetica findfont ", 8, " pt scalefont setfont\n"; print "$xorigin $yorigin translate\n"; print "$scale dup scale\n"; print "rnapicture showpage\n"; } ######################################### sub drawLine{ ## issues PostScript commands needed to draw a line. ## RETURNS: nothing. ######################################### my ($x1, $y1, ## coordinates of one endpoint $x2, $y2, ## coordinates of other endpoint $thick ## thickness of line ) = @_; ($x1, $y1, $x2, $y2) = (0.8 * $x1 + 0.2 * $x2, 0.8 * $y1 + 0.2 * $y2, 0.2 * $x1 + 0.8 * $x2, 0.2 * $y1 + 0.8 * $y2); print "$x1 $y1 $x2 $y2 $thick rnaline\n"; ## record this info to know overall size of picture. ($x1,$x2) = ($x2,$x1) if ($x1>$x2); $maxx = $x2 if $x2>$maxx; $minx = $x1 if $x1<$minx; ($y1,$y2) = ($y2,$y1) if ($y1>$y2); $maxy = $y2 if $y2>$maxy; $miny = $y1 if $y1<$miny; } ######################################### sub drawBase{ ## issues PostScript commands needed to print letter for a base. ## RETURNS: nothing. ######################################### my ($x, $y, ## coordinates of base $b ## letter (character) to print; can be 5 or 3 ) = @_; $b .= "'" if ("53" =~ $b); print "($b) $x $y rnabase\n" } ######################################### sub drawRna { ## assists drawRnaStructure to create PostScript drawing of RNA. ## Invokes itself recusively for each ring of the structure. ## RETURNS: nothing. ######################################### my ($l, $r, ## range of @structure and @bases to be drawn. $lx, $ly, ## coordinates of first position of ring. $rx, $ry ## coordinates of last position of ring. ) = @_; my $level=0; my $count=2; for (my $i=$l+1; $i<$r; $i++) { $level-- if ($structure[$i] eq ")"); $count++ if $level==0; $level++ if ($structure[$i] eq "("); } my $theta = 2 * 3.14159 / $count; my $rad = 1 / (2*sin($theta/2)); ## radius my $h = $rad * cos($theta/2); my ($cx, $cy) = ((($lx+$rx)/2.0)+$h*($ly-$ry), ## center of cir +cle (($ly+$ry)/2.0)+$h*($rx-$lx)); my $alpha = atan2($ly-$cy,$lx-$cx); my ($ii,$xx,$yy) = ($l,$lx,$ly); for (my $i=$l+1; $i<=$r; $i++) { $level-- if ($structure[$i] eq ")"); if ($level==0) { $alpha -= $theta; my ($x,$y) = ($cx+$rad*cos($alpha), $cy+$rad*sin($alpha)); drawLine($xx,$yy,$x,$y,0); drawRna($ii,$i,$xx,$yy, $x, $y) if ($structure[$i] eq ")"); drawBase($xx,$yy, $bases[$ii]); ($xx,$yy)=($x,$y); $ii = $i; } $level++ if ($structure[$i] eq "("); } drawLine($xx,$yy,$rx,$ry,0); drawBase($xx,$yy, $bases[$r-1]); drawBase($rx,$ry, $bases[$r]); my %bonds = (GU=>1,UG=>1,AU=>2,UA=>2,CG=>3,GC=>3); drawLine($lx,$ly,$rx,$ry,$bonds{$bases[$l].$bases[$r]}+0) unless ($lx==0 || $ly==0); ## 3-5 pair } ######################################### sub drawRnaStructure { ## creates PostScript drawing of RNA structure. ## Most of the work is delegated to the recursive subroutine drawRna. ## RETURNS: nothing. ######################################### my ($basestring,$structurestring) = @_; @bases = split(//, 5 . $basestring . 3); @structure = split(//, "($structurestring)"); begin_PostScriptPicture(); drawRna(0, $#structure, 0,0, 1,0); end_PostScriptPicture(); } ######################################### ## MAIN PROGRAM ######################################### ## Read string of bases, string of parentheses. Find number of H bond +s. print "\n\n Please enter the mRNA sequence as A,T,G & C(.txt): "; # Li +ne 14 my $base =<STDIN>; chomp($base); # open the file, or exit unless ( open(BASE, $base) ) { print "Cannot open file \"$base\"\n\n"; exit; } my $basestring =<BASE>; close BASE; # To remove white(blank) space: $basestring=~ s/\s//g; $basestring=~ s/T/U/gi; # Line 26 print"\n The mRNA containing U in place of T:\n $basestring\n\n\n"; # Enter parentstring: print"\n Please enter the RNA folding as a .txt file\n from the output of 1st step [as (..(())..etc.]: "; my $paren=<STDIN>; chomp($paren); # open the file, or exit unless ( open(PAREN, $paren) ) { print "Cannot open file \"$paren\"\n\n"; exit; } my $parenstring=<PAREN>; close PAREN; drawRnaStructure($basestring,$parenstring); my $output="ResDrawRNA .txt"; unless (open(RESULT,">$output")){ print"Cannot open file\"$output\".\n\n"; exit; } print RESULT"\n Here goes the mRNA structure: \n\n drawRnaStructure($basestring,$parenstring);\n\n"; close(RESULT);

The input file t1.txt:

AGAAGCCCTTCCCCCGGTATTT

The input file t2.txt:

(.(((...)(((...)).))))

The results in the cmd screen appear as follows without the diagram:

Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\x>cd desktop C:\Users\x\Desktop>drawrna.pl Please enter the mRNA sequence as A,T,G & C(.txt): t1.txt The mRNA containing U in place of T: AGAAGCCCUUCCCCCGGUAUUU Please enter the RNA folding as a .txt file from the output of 1st step [as (..(())..etc.]: t2.txt #### I don't know how to correct the following for getting the structu +re: /rnaline { pt setlinewidth moveto lineto stroke} def /rnabase { moveto -0.12 -0.12 rmoveto show} def /rnapicture { -2.65358979323338e-007 0.199999999999824 -1.06143591729335e-006 0.7999 +9999999929 6 0 rnaline (5') 0 0 rnabase 0.199998673204399 1.00000053071708 0.799998673202287 1.00000212287095 +0 rnaline -0.0618054323093417 1.19021107365562 -0.247217748852677 1.760844294625 +13 0 rnali ne (A) -1.32679489661669e-006 0.99999999999912 rnabase -0.147219017000719 2.06861319158651 0.338189495098491 2.42128666150115 + 0 rnaline (G) -0.309021854367122 1.95105536828163 rnabase 0.661796417623143 2.4212883789412 1.14720867309789 2.06862006134672 0 +rnaline 0.617548223648782 2.70064872593698 0.970215897200448 3.18606144932982 +0 rnaline (A) 0.499992332464894 2.53884448480603 rnabase 1.2495761854877 3.23031001395802 1.73498937679778 2.87764298444979 0 r +naline 1.10867561610605 3.54677026553781 1.17138709927121 4.14348399076896 0 +rnaline (A) 1.08777178838434 3.34786569046077 rnabase 1.37499918384373 4.42373776915214 1.92312395439612 4.66778537907054 0 +rnaline 1.0584631229551 4.49101601575487 0.656979710841618 4.93689836548147 0 +rnaline (G) 1.19229092699293 4.342388565846 rnabase 0.623149761925029 5.25873213447585 0.923143327288744 5.77835109173238 +0 rnaline (C) 0.52315190680379 5.08552581539034 rnabase 1.21877126160757 5.90997770286862 1.80566149920032 5.78523857902081 0 +rnaline (C) 1.02314118240998 5.95155741081789 rnabase 2.02219970496771 5.54475401333257 2.08492408467712 4.94803944011565 0 +rnaline (C) 2.00129157839791 5.74365887107154 rnabase (U) 2.10583221124692 4.74913458237668 rnabase 1.37499918384373 4.42373776915214 1.92312395439612 4.66778537907054 1 +rnaline (G) 1.19229092699293 4.342388565846 rnabase 2.26763692029425 4.63157933523685 2.75305104743626 4.27891359381735 0 +rnaline (U) 2.10583221124692 4.74913458237668 rnabase 2.8939524565731 3.96245371613098 2.83124255684161 3.36573982449139 0 r +naline 3.11048473225742 4.20294324583196 3.69737165957888 4.3276979432953 0 r +naline (U) 2.91485575648359 4.16135834667752 rnabase 3.99300308662394 4.19607917695809 4.29301044043765 3.67646818048313 0 +rnaline 4.06620416816146 4.46928552352663 4.58581476658773 4.76929356675729 0 +rnaline (C) 3.8930006353527 4.36928284244974 rnabase 4.85902121027885 4.69609284770862 5.15902994292594 4.17648264733194 0 +rnaline 4.89284176501818 5.01792760404477 5.29431216188326 5.46382167267655 0 +rnaline (C) 4.75901829939649 4.86929624783418 rnabase 5.61084625881948 5.53110915877608 6.15897815276308 5.2870775484429 0 r +naline (C) 5.42813562750495 5.61245302888714 rnabase 6.32078706760167 5.00682888138526 6.25808191817385 4.41011449054553 0 +rnaline (C) 6.34168878407761 5.20573367833184 rnabase 6.04155073211999 4.16962360432044 5.45466232338622 4.0448653364849 0 r +naline (C) 6.23718020169791 4.21120969359895 rnabase (G) 5.2590328538083 4.00327924720638 rnabase 4.85902121027885 4.69609284770862 5.15902994292594 4.17648264733194 3 +rnaline (C) 4.75901829939649 4.86929624783418 rnabase 5.08582886138842 3.9032763007634 4.56621688412877 3.60326746143446 0 r +naline (G) 5.2590328538083 4.00327924720638 rnabase (G) 4.39301289170888 3.50326451499148 rnabase 3.99300308662394 4.19607917695809 4.29301044043765 3.67646818048313 3 +rnaline (C) 3.8930006353527 4.36928284244974 rnabase 4.25918903168101 3.35463351389399 3.8577174515974 2.90874051060153 0 r +naline (G) 4.39301289170888 3.50326451499148 rnabase 3.54118272464185 2.84145464639221 2.9930501238588 3.0854900570567 0 rn +aline (U) 3.72389359156953 2.76010950950405 rnabase (A) 2.81033925693112 3.16683519394486 rnabase 2.8939524565731 3.96245371613098 2.83124255684161 3.36573982449139 2 r +naline (U) 2.91485575648359 4.16135834667752 rnabase 2.62763016032512 3.0854856167453 2.07950287050714 2.84143688514661 0 r +naline (A) 2.81033925693112 3.16683519394486 rnabase (U) 1.89679377390114 2.76008730794705 rnabase 1.2495761854877 3.23031001395802 1.73498937679778 2.87764298444979 2 r +naline (A) 1.08777178838434 3.34786569046077 rnabase 1.77923757077214 2.59828263745402 1.42656896138514 2.11286862597493 0 +rnaline (U) 1.89679377390114 2.76008730794705 rnabase (U) 1.30901275825614 1.9510639554819 rnabase 0.661796417623143 2.4212883789412 1.14720867309789 2.06862006134672 2 +rnaline (A) 0.499992332464894 2.53884448480603 rnabase 1.24720994124523 1.7608516951033 1.06180149021249 1.19021491396751 0 r +naline (U) 1.30901275825614 1.9510639554819 rnabase (U) 0.999998673201583 1.00000265358891 rnabase 0.199998673204399 1.00000053071708 0.799998673202287 1.00000212287095 +2 rnaline (A) -1.32679489661669e-006 0.99999999999912 rnabase 0.999998938561266 0.80000212287113 0.999999734640317 0.200000530717783 + 0 rnaline (U) 0.999998673201583 1.00000265358891 rnabase (3') 1 0 rnabase } def /pt {88.5200325285177 div} def /Helvetica findfont 8 pt scalefont setfont 23.0354980737536 0 translate 88.5200325285177 dup scale rnapicture showpage C:\Users\x\Desktop>

The text output file looks as follows without the RNA structure:

drawRnaStructure(AGAAGCCCTTCCCCCGGTATTT,(.(((...)(((...)).)))));

Replies are listed 'Best First'.
Re: How can I get the structure of mRNA in cmd and text page using a perl script?
by marto (Cardinal) on Sep 04, 2012 at 10:04 UTC
Re: How can I get the structure of mRNA in cmd and text page using a perl script?
by pvaldes (Chaplain) on Sep 04, 2012 at 13:38 UTC

    mmmh... not sure if this is the place to copy a copyrigthed script, unless you have a permit from Cambridge University Press, probably shouldn't do this

    ... in any case, can you explain what do you mean with "get the structure"? Is your desired oputput like a 3d picture of a protein, or like a genetic plot, or something else? what are you trying to do?

    Take a look to Vienna. Here are some perl scripts that can help you to start.

Re: How can I get the structure of mRNA in cmd and text page using a perl script?
by Anonymous Monk on Sep 04, 2012 at 09:53 UTC

    But after the input of two files, the cmd screen shows some results which are not at all understandable to me as I never had a course in computer. Perl is the first computer language I am learning. May I request perlmonks to suggest me what I should do to view the secondary structure in cmd and to get the structure in the text output file? For the structure shall I have to use other software? If so, which software(s)?

    I find it highly unlikely that the book wouldn't explain how to use this program, or view the output

Re: How can I get the structure of mRNA in cmd and text page using a perl script?
by uday_sagar (Scribe) on Sep 04, 2012 at 10:22 UTC

    Hi supriyoch,

    I didnt get the goal of the above code, Can you give us the typical output text file (i mean the structure) for any known combination of t1.txt and t2.txt (that earlier produced structure!)

    Actually, there can be many issues like

    1. correct syntax of t1.txt and t2.txt

    One could be:

    There should be no extra empty lines in these text files at the end which disturbs the program flow

    2. do input text files have patterns that go beyond the capability of the perl code?

Re: How can I get the structure of mRNA in cmd and text page using a perl script?
by choroba (Cardinal) on Sep 04, 2012 at 09:50 UTC
    Seems like a chunk of a postscript file.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ