http://www.perlmonks.org?node_id=290969

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Any help optimizing this slow code would be great.

Thanks in advance

-Mark

# circularGenome.pm # package RG::circularGenome; # If this WERE a module, the boilerplate would look something like thi +s: # # require Block; # # our @ISA = qw(Exporter); # our @EXPORT = qw( # ... # ); 1; package main; use strict; + # DEBUG use warnings; + # DEBUG use Data::Dumper; + # DEBUG sub create_circularGenome { # # Create a circular genome canvas. # # Open window, set org. # # Input: $c source canvas # Output: - # Global: %global main::trace_me(); my ($c) = @_; our (%global,$canvas); # our $scale = $global{$c}->{SCALE}; $canvas = create_canvas(-type => 'circular', -org => $global{$c}->{org_name}, -buttons => \&scatterplot_buttonbar); ############# Create window elements ####################### our $label_status = 0; our $cvh = 500; # canvas height our $cvw = 700; # canvas width our $org = $global{$c}->{org_name}; # my $label_toggle = $canvas->Checkbutton(-text=>'Turn on gene names +', # -onvalue=>1, # -offvalue=>0, # -variable=> \$label_status +, # -command=> [\&drawGenome, +$canvas, $cvh, $cvw, $label_status, $scale ] # )->pack(-side=>'bottom'); &drawGenome($canvas,$cvh,$cvw,$label_status,$org); # Graph Label } #$mw->deiconify(); #$mw->raise(); #MainLoop; ###################################################################### +## sub drawGenome { my ($canvas,$cvh,$cvw,$label_status,$org) = @_; print"label status=$label_status\n"; # print"scale=$scale\n"; my ($circumference,$pi,$genome_length,$radius,$message,$deg2rad,$dr +awn_angle); my @colors = qw/grey purple pink blue turquoise green chartreuse ye +llow gold orange/; our($dbh,$sth,@myorfs,@orfids,$gene_name); ########################### Database section ################ # Define database connection $dbh = DBI->connect('dbi:mysql:genomatik','genomatik','3369',{ Rais +eError => 1,AutoCommit => 1 }); # grab protein positions $sth = $dbh->prepare("select id from org where full_name='$org'") o +r die "Can't prepare SQL statement ", $dbh->errstr(),"\n"; $sth->execute() or die "Can't execute SQL statement: ", $sth->errst +r(), "\n"; my($orgid) = $sth->fetchrow_array(); print"org=$org\torgid=$orgid\n"; $sth = $dbh->prepare ("select text_id,gene_name,protein5, protein3 +from protein where org_id=$orgid order by protein5 desc") or die "Can +'t prepare SQL statement ", $dbh->errstr(),"\n"; $sth->execute() or die "Can't execute SQL statement: ", $sth->errst +r(), "\n"; my $count=0; while (my($text_id,$gene_name,$protein5,$protein3) = $sth->fetchrow +_array()) { $myorfs[$count]= [$text_id,$gene_name,$protein5,$protein3]; $count++; } my $orfcount = $#myorfs+1; print" There are $orfcount orfs in $org\n"; # Mathmatical Basis # # For a circle with central angle A, arc length S and circumference + C, # S/C = A/360 degrees. Therefore A = 360(S/2*pi*radius) # # Since we draw the longest arc (last orf) first, we just draw a fu +ll circle. After that, # the calculated central_angle is added to drawn_angle. As we draw +the other orfs, the arc that is drawn # is 360 - drawn_angle. The central_angle is added to drawn_angle s +o that the next one will be placed # correctly. $pi = 3.1416; $deg2rad = $pi/180; $radius = 230; # define radius $circumference = 2 * $pi * $radius; # define circumference $drawn_angle=0; $genome_length=0; for (my $i=0;$i<$orfcount;$i++) { my $distance = $myorfs[$i][3] - $myorfs[$i][2]; # calc orf leng +th $genome_length = $genome_length + $distance; # calc total ge +nome length } $canvas->createText(350, 250, -fill => 'black', -text => "$org Geno +me"); $canvas->createText(350, 265, -fill => 'black', -text => "$genome_l +ength bp"); $drawn_angle=0; for (my $i=0;$i<$orfcount;$i++) { my $color = int($i % 10); my $orig_dist = $myorfs[$i][3] - $myorfs[$i][2]; # calc origina +l orf length my $orf_pct = $orig_dist/$genome_length; # convert orf +length to pct of genome length my $graph_arc_length = $circumference * $orf_pct; # calc equival +ent arc length corresponding to same pct of circum. my $central_angle = 360 * ($graph_arc_length / $circumference); +# calc central angle for that arc # $message = "label status = $label_status"; if ($i == 0) { $canvas->createArc($cvw / 2 - $radius, # draw first a +rc all the way around circle $cvh / 2 - $radius, # set drawn_an +gle = angle you want this orf to be $cvw / 2 + $radius, # It will be u +sed to set the extent for the next arc $cvh / 2 + $radius, # drawn, leavi +ng the right arc length exposed for this orf. -width=> 10, -fill=> $colors[$color], -outline=> $colors[$color], -style=> 'arc', -start=>90, -extent=> -359.99999999, -tags=>$myorfs[$i][1]); $drawn_angle = 360 - $central_angle; $gene_name = $myorfs[$i][1]; } else { $canvas->createArc($cvw / 2 - $radius, # draw all of +the other arcs on top of the first arc $cvh / 2 - $radius, $cvw / 2 + $radius, $cvh / 2 + $radius, -width=> 10, -fill=> $colors[$color], -outline=> $colors[$color], -style=> 'arc', -start=>90, -extent=> -$drawn_angle, -tags=>$myorfs[$i][1]); $drawn_angle = $drawn_angle - $central_angle; $gene_name = $myorfs[$i][1]; } # if ($scale < 1) # { # &drawLabels($drawn_angle,$gene_name); my $tick_length = 15; my $real_angle = 360-$drawn_angle+90; my $xstart = $cvw/2.0 + $radius * cos($real_angle*$deg2r +ad); my $ystart = $cvh/2.0 - $radius * sin($real_angle*$deg2r +ad); my $xstop = $xstart + $tick_length * cos($real_angle*$deg2r +ad); my $ystop = $ystart - $tick_length * sin($real_angle*$deg2r +ad); $canvas->createLine($xstart,$ystart,$xstop,$ystop); if (89< $real_angle && $real_angle <= 112.5) { $canvas->createText($xstop,$ystop, -anchor=>'s', -text=>"$ +gene_name"); } elsif ((112.5 < $real_angle) && ($real_angle <= 157.5)) { $canvas->createText($xstop,$ystop, -anchor=>'se', -text=>" +$gene_name"); } elsif ((157.5 < $real_angle) && ($real_angle <= 202.5)) { $canvas->createText($xstop,$ystop, -anchor=>'e', -text=>"$ +gene_name"); } elsif ((202.5 < $real_angle) && ($real_angle <= 247.5)) { $canvas->createText($xstop,$ystop, -anchor=>'ne', -text=>" +$gene_name"); } elsif ((247.5 < $real_angle) && ($real_angle <= 292.5)) { $canvas->createText($xstop,$ystop, -anchor=>'n', -text=>"$ +gene_name"); } elsif ((292.5 < $real_angle) && ($real_angle <= 337.5)) { $canvas->createText($xstop,$ystop, -anchor=>'nw', -text=>" +$gene_name"); } elsif ((337.5 < $real_angle) && ($real_angle <= 382.5)) { $canvas->createText($xstop,$ystop, -anchor=>'w', -text=>"$ +gene_name"); } elsif ((382.5 < $real_angle) && ($real_angle < 427.5)) { $canvas->createText($xstop,$ystop, -anchor=>'sw', -text=>" +$gene_name"); } elsif ((427.5 < $real_angle) && ($real_angle < 475)) { $canvas->createText($xstop,$ystop, -anchor=>'s', -text=>"$ +gene_name"); } # } } } ##################################################### # Displays value of $message in lower box when mouse is over item, goe +s back to null when mouse leaves. #sub bind_message #{ # my($widget, $msg) = @_; # $widget->Tk::bind('<Enter>', [ sub { $message = $_[1]; }, $msg ]); # $widget->Tk::bind('<Leave>', sub { $message = ""; }); #}

update (broquaint): added formatting + <readmore> tag

Replies are listed 'Best First'.
Re: sloooowwwwww code!
by tachyon (Chancellor) on Sep 12, 2003 at 10:54 UTC

    As suggested here you need to Benchmark - there are a range of options as noted.

    As far as you code goes you database work is pretty ordinary. Given you are connecting to a DB and then executing sql you should cache as much as possible. Database connections, STH preparation all take time (lots of it) so you want to avoid as much as possible. You use our $dbh but don't use $dbh ||= connect so you are making a connection with every call to your main subroutine. It seems you are also not aware of ? bind placeholders. You also appear not either finishing your STHs or disconnecting from the DB. You should be doing something like this (broad brush):

    package Main; my $dbh = connect..... my $sth_cache; $sth_cache->{query1} = $dbh->prepare_cached('SELECT foo FROM bar WHERE + foo = ?'); $sth_cache->{query2} = $dbh->prepare_cached('SELECT foo1 FROM bar1 WHE +RE foo1 = ?'); END{ $sth_cache->{$_}->finish for keys %{$sth_cache}; $dbh->disconnect if $dbh; } #blah sub widget { $sth_cache->{query1}->execute($bind_value_for_foo_WHERE_clause); $res = sth->fetchall_arrayref(); # blah }

    This will be faster. You have only given us an incomplete chunk of your code so it is hard to say what else you could do (other than loose a lot of the if/elsif as suggested). Note fetchall_arrayref is often several times faster than fetching in a loop. I would note that you are effectively generating a fetchall_array_ref in a loop which is a waste. As always YMMV and Benchmarking will show you the ONE TRUE WAY. The one true way is not always worth pursuing as THE ALMOST TRUE WAY THAT TOOK ME FAR LESS TIME TO FIND is ofen good enough to get the job done.

    As a general DB thing if you have a WHERE foo = ... clause and a big DB and you don't have a primary key, unique index or a least an index on 'foo' then you are doing a linear search through EVERY record in that table. In a big table (say a million records) this will take seconds without and index (roughly 2-3 on the current top of the range Linux type Xeon/RaidX hardware). It will take microseconds with an index. If you are doing that you are wasting the power of the DB and might as well use a flat file....

    Speaking of hardware you can almost never have enough memory for a DB or Perl. You can almost always spend memory and gain speed.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: sloooowwwwww code!
by Jaap (Curate) on Sep 12, 2003 at 10:03 UTC
    Here's a little tip to get you going:
    my @directions = ('w', 'sw', 's', 'se', 'e', 'ne', 'n', 'nw'); my $anchor = $directions[int($real_angle/45 + 0.5)];
    this piece of untested code would save you 40 lines of code. Not bad eh? Before thinking of speed, try to see if you can make your code more compact.

      Actually there is very little relationship to the volume of code and the speed. Even in Perl. At least so says Larry and I happen to agree. Sure using 10,000 lines where 2 will do means you will have a longer compile time but when people think/say slow they are generally referring to runtime. At runtime all that really matters is the Algorithm.

      When dealing with slow code making it shorter is not, and never will be the key. Testing it using:

      will let you see precisely where the bottlenecks REALLY ARE. Once you have identified the problem areas (usually not where you suspect!) you can apply your optimisation efforts to where you NOW KNOW it will make a difference.

      This is not to say that your suggested changes are not good, just based on practical experience of what works.

      For example we have the OP using our $dbh but not using that to make the database handle persistent with a $dbh ||= connect Connecting to a database IS slow (comparatively speaking) so doing it once in a given script is more efficient.

      Update

      Corrected sloppy touch typing tpyos. Cheers liz

      cheers

      tachyon

      s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: sloooowwwwww code!
by dws (Chancellor) on Sep 12, 2003 at 16:07 UTC
    Any help optimizing this slow code would be great.

    Good advice on profiling aside, I would start with the query and database schema. A lot of time can get chewed up by an inefficient query, or one that could be sped up by adding indexes.

    Look at the query

    select text_id,gene_name,protein5,protein3 from protein where org_id=$orgid order by protein5 desc"
    and ask two questions:
    1. Is there an index on org_id?
    2. Is it really necessary to have the database sort the results?
    Unless org_id is indexed, the database must make a linear scan over the protein table. If the table is big, the scan can be slow. If the field is indexed, the retrieval will usually be considerably faster (except in cases where you're going to match most of the records anyway.)

    I can't tell from your code whether it matters whether the query results are sorted or not. If it doesn't matter, drop the order by clause.

    The non-SQL thing that jumped out was the redundant trig calculations. You're doing each sine and cosine calculation twice, where once would be sufficient.

    Then profile.

Re: sloooowwwwww code!
by crouchingpenguin (Priest) on Sep 12, 2003 at 13:50 UTC