Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Foreach Loop Optimization

by upallnight (Sexton)
on Aug 01, 2007 at 15:05 UTC ( #630084=perlquestion: print w/replies, xml ) Need Help??

upallnight has asked for the wisdom of the Perl Monks concerning the following question:

I have this foreach loop that I am trying to optimize for a large flat file database.
The database is a table with rows stored in the @project_module_list array and columns separated by commas.
When running on a small 50 row database, it spits the html out pretty fast. However, when running on an 800 row database, it takes an average of 9 seconds to finish running.
Can someone help me optimize this loop?

P.S. Each column has its data separated by a colon. Column element 0 is the column data, element 1 is the user who last edited it, and element 2 is the time stamp of the last edit.
Maybe there is just too much data to process it faster?

Here is the loop code:

foreach $project_module_list (@project_module_list) { if($moduleCount != 0) { @moduleData = split /,/, $project_module_list; print "<tr><td align=center><div class=table_data>$moduleCount +</div></td>"; $dataCount = 0; foreach $moduleData (@moduleData) { @moduleField = split /:/, $moduleData; $recentChange = ""; $_ = $headers[$dataCount]; $localtime = FormatTime($moduleField[2]) if $moduleField[2 +]; $localtime = "n/a" unless $moduleField[2]; if( /approv/i ) { $status = "N/A"; if($moduleField[0]) { $default = "checked"; $status = "$moduleField[1]"; $color = " bgcolor=\"#CCFF99\""; } else { $default = ""; $color = " bgcolor=\"#FFCC66\""; } if($moduleField[2] > time - $changeBuffer) { $recentChange = " bgcolor=\"#99CCFF\""; } print "<td$color align=center><div class=table_data><i +nput type=checkbox name=\"$headers[$dataCount]$moduleCount\" $default +></div></td><td$recentChange title=\"Changed by $moduleField[1] at $l +ocaltime\"><div class=table_data>$status</div></td>"; } else { if($moduleField[2] > time - $changeBuffer) { $recentChange = " bgcolor=\"#99CCFF\""; } print "<td$recentChange><div title=\"Changed by $modul +eField[1] at $localtime\" class=table_data onClick=\"ToggleShowHide(' +label_$headers[$dataCount]$moduleCount','$headers[$dataCount]$moduleC +ount')\" id=\"label_$headers[$dataCount]$moduleCount\" style='display +: block;'>$moduleField[0]</div> <div id=\"$headers[$dataCount]$module +Count\" style='display: none;'><input type=text name=\"$headers[$data +Count]$moduleCount\" value=\"$moduleField[0]\"><input type=button onC +lick=\"ToggleShowHide('label_$headers[$dataCount]$moduleCount','$head +ers[$dataCount]$moduleCount')\" value=OK></div></td>"; } $dataCount++; } print "</tr>\n"; } $moduleCount++; $default = $color = ""; }

Replies are listed 'Best First'.
Re: Foreach Loop Optimization
by ikegami (Pope) on Aug 01, 2007 at 15:33 UTC

    I don't see anything that would make that big of a difference. I think the problem is elsewhere.

    • time is a system call. You could save a little speed by calling it once outside the loop.

    • If /approv/i should actually checking for equality, lc($_) eq 'approv' is 122% faster when the check is true and 28% faster when the check is false.

    Comments not related to performance:

    • $localtime = FormatTime($moduleField[2]) if $moduleField[2]; $localtime = "n/a" unless $moduleField[2];

      could be written

      $localtime = $moduleField[2] ? FormatTime($moduleField[2]) : "n/a";
    • Assigning to $_ is dangerous. And it's not even necessary here. Change

      $_ = $headers[$dataCount]; if( /approv/i )

      to

      if( $headers[$dataCount] =~ /approv/i )
      Thanks for your code suggestions!

      The code is not calling time in the loop, only formatting time value from the database.

      Unfortunately, I have to use /approv/i because it has to match approv in the string when the string could be anything.

      Is $localtime = $moduleField[2] ? FormatTime($moduleField[2]) : "n/a"; more efficient than my two lines of code?

      I don't think changing $_ is dangerous. The perdoc does it a lot.
      For some reason, I was thinking =~ would change the value of $headers[$dataCount].
      Thanks for that code, I bet its faster than changing the value of $_.

        I don't think changing $_ is dangerous. The perdoc does it a lot.

        Sometimes it won't cause any problems. Sometime it creates hard to debug action-at-a-distance problems because it's common for code to rely on $_ being left unchanged by the functions it calls.

        If you modify $_, localize it. Of course, local $_ = $val; is buggy so local *_ = \$val; or for ($val) should be used instead. Or better yet, just don't assign to $_.

        In this particular case, the only thing the assignment to $_ does is obfuscate, so it's a bad idea regardless of all the problems and complexity I just mentioned.

        For some reason, I was thinking =~ would change the value of $headers[$dataCount].

        All =~ does is specify the operand for the following (m//, s///, tr///, etc) operator.

Re: Foreach Loop Optimization
by RMGir (Prior) on Aug 01, 2007 at 15:26 UTC
    That's a scary looking loop :) And scaier HTML...

    How much of that 9 seconds is perl runtime, and how much is the time the web browser takes to render the table? Or is that just the perl time? Rendering an 800 row table usually slows browsers down a bit...

    There's nothing obviously slow about the way you're doing things.

    I suggest doing some experimenting.

    What's the speed if you comment out the print statements?

    Is FormatTime doing anything stupidly slow?

    That's all I can think of offhand...


    Mike
      Wow! I didn't expect so many answers so quickly!
      I haven't had time to read them all but I found what was slowing it down so much:
      the print command on its way to the browser
      Here is the FormatTime function code:
      sub FormatTime { ## Format Time Function ## ## Takes Arguments: ## $time which is the integer value returned by the time functi +on ## Returns: ## a formatted time date string my $time = shift; (my $sec, my $min, my $hour, my $mday, my $mon, my $year, my $wday +, my $yday, my $isdst) = localtime($time); my $show_min = ":"; $show_min = ":0" if $min<10; $mon++; $year += 1900; return "$hour$show_min$min $mon/$mday/$year"; }

      At first I had shrunk that function down to almost nothing and then thought of the print command.
      Instead of printing inside the loop, I repeatedly appended the output to a string $output and then printed that string outside the loop:
      my $output = ""; foreach $project_module_list (@project_module_list) { if($moduleCount != 0) { @moduleData = split /,/, $project_module_list; $output = $output . "<tr><td align=center><div cla +ss=table_data>$moduleCount</div></td>"; $dataCount = 0; foreach $moduleData (@moduleData) { @moduleField = split /:/, $moduleData; $recentChange = ""; $_ = $headers[$dataCount]; $localtime = FormatTime($moduleField[2]) if $m +oduleField[2]; $localtime = "n/a" unless $moduleField[2]; if( /approv/i ) { $status = "N/A"; if($moduleField[0]) { $default = "checked"; $status = "$moduleField[1]"; $color = " bgcolor=\"#CCFF99\""; } else { $default = ""; $color = " bgcolor=\"#FFCC66\""; } if($moduleField[2] > time - $changeBuffer) + { $recentChange = " bgcolor=\"#99CCFF\"" +; } $output = $output . "<td$color align=cente +r><div class=table_data><input type=checkbox name=\"$headers[$dataCou +nt]$moduleCount\" $default></div></td><td$recentChange title=\"Change +d by $moduleField[1] at $localtime\"><div class=table_data>$status</d +iv></td>"; } else { if($moduleField[2] > time - $changeBuffer) + { $recentChange = " bgcolor=\"#99CCFF\"" +; } $output = $output . "<td$recentChange><div + title=\"Changed by $moduleField[1] at $localtime\" class=table_data +onClick=\"ToggleShowHide('label_$headers[$dataCount]$moduleCount','$h +eaders[$dataCount]$moduleCount')\" id=\"label_$headers[$dataCount]$mo +duleCount\" style='display: block;'>$moduleField[0]</div> <div id=\"$ +headers[$dataCount]$moduleCount\" style='display: none;'><input type= +text name=\"$headers[$dataCount]$moduleCount\" value=\"$moduleField[0 +]\"><input type=button onClick=\"ToggleShowHide('label_$headers[$data +Count]$moduleCount','$headers[$dataCount]$moduleCount')\" value=OK></ +div></td>"; } $dataCount++; } $output = $output . "</tr>\n"; } $moduleCount++; $default = $color = ""; } print qq~ $output </table> <script> window.onload=function() { document.approve.submit +1.disabled=false; } </script> <table border=0 cellpadding=10><tr> <td>&#160;</td> <td>&#160;</td> <td>&#160;</td> <td><input type=submit value="Save Changes" name=" +submit2"></td> </tr></table> </form> ~;

      This changed the run time from an average of 10 seconds to 3 seconds!
      To get the run time I don't do anything fancy, just print the time at the beginning and end of the script.

      Thanks for all the replies!

        $output = $output . "<td$recentChange><div title=\"Changed by $moduleF +ie ld[1] at $localtime\" class=table_data onClick=\"ToggleShowHide('label +_$headers[$dataCount]$moduleCount','$headers[$dataCount]$moduleCount' +)\" id=\"label_$headers[$dataCount]$moduleCount\" style='display: blo +ck;'>$moduleField[0]</div> <div id=\"$headers[$dataCount]$moduleCount +\" style='display: none;'><input type=text name=\"$headers[$dataCount +]$moduleCount\" value=\"$moduleField[0]\"><input type=button onClick= +\"ToggleShowHide('label_$headers[$dataCount]$moduleCount','$headers[$ +dataCount]$ moduleCount')\" value=OK></div></td>";
        .... is pretty darned ugly (no offense intended).

        You could tidy that up somewhat, and make it a bit more readable, by doing something like:

        $output .= qq( <td$recentChange> <div title="Changed by $moduleField[1] at $localtime" class=table_data onClick="ToggleShowHide('label_$headers[$d_cnt]$m_cnt','$h +eaders[$d_cnt]$m_cnt')" id="label_$headers[$d_cnt]$m_cnt" style='display: block;'> $moduleField[0] </div> <div id="$headers[$d_cnt]$m_cnt" style='display: none;'> <input type=text name="$headers[$d_cnt]$m_cnt" value="$mod +uleField[0]"> <input type=button onClick="ToggleShowHide('label_$headers[$d_cnt]$m_cnt' +,'$headers[$d_cnt]$m_cnt')" value=OK> </div> </td> );
        Notes:
        • $foo .= $bar is the same as $foo = $foo . $bar
        • use the quoting operators (qq - look for "Regexp Quote-Like Operators" in perlop) in preference to double-quotes - saves lots of messing about escaping special characters.
        • Whilst descriptive variable names are generally considered a good thing™, if you make them too long you wind up with big long statements like the one above, that are difficult to read and understand. Consider shortening some of your variable names - especially those that have limited scope. A good rule of thumb is that the shorter the scope, the shorter the name.
        • When you have a big long statement like the one above, split it over several lines and use indenting for vertical alignment, to improve readability (and make debugging easier).

        Hope this helps,
        Darren :)

Re: Foreach Loop Optimization
by tirwhan (Abbot) on Aug 01, 2007 at 15:27 UTC

    You could replace lines 11/12 with

    $localtime = $moduleField[2] ? FormatTime($moduleField[2]) : "n/a";

    and also put your printout data into a variable and only print once per loop iteration instead of every time you want to add output. But I have a hunch that those changes will hardly matter and the real timewaster in this loop is the FormatTime subroutine (which you haven't shown us). I suggest using DevelDProf to find out what your code spends most of its time doing.


    All dogma is stupid.
      Would that line be more efficient than my two lines of code?

      Thanks for the code suggestions. It was the print command I guess because of the latency from the server to my browser.
        Doing a quick benchmark seems to show that it is a little bit more efficient, probably because you are only having to test once. It is certainly more readable.

        use strict; use warnings; use Benchmark q{cmpthese}; my @states; push @states, int rand 2 for 1 .. 10000; cmpthese(-5, { tirwhan => sub { foreach my $mf2 ( @states ) { my $lt = $mf2 ? q{processed time} : q{n/a}; } }, upallnight => sub { foreach my $mf2 ( @states ) { my $lt; $lt = q{processed time} if $mf2; $lt = q{n/a} unless $mf2; } }, });

        Here's the output

        Rate upallnight tirwhan upallnight 22.6/s -- -22% tirwhan 29.1/s 29% --

        (Usually when I post benchmarks someone points out that I've got it completely wrong so I'll probably get shot down in flames for this ;-)

        Cheers,

        JohnGG

        Negligibly so. It's a readability issue.

        And more likely it's the time needed to render the table than the actual communication time.

Re: Foreach Loop Optimization
by ww (Archbishop) on Aug 01, 2007 at 17:52 UTC
    /me reaches for large, heavyduty cluestick.

    This may be beating a dead horse, because others have pointed this out, but IMHO, it's important that you understand that between your js (onClick), your 800 row table, and your omission of (at least) a "width" value for your table and its cells, you've assigned a lot of computations to your browser.

    Just as one example, your <td align="center"> won't let the browser even begin to caluclate where the "center" is until it's evaled the whole table, because it doesn't know how wide the table itself or the cell should be (they're dynamic values when not specified) until it has all the data in hand.

    Try, for example, a static 800 row table, on your local machine (pure html, no Perl, no js and using widths); then try again without a width value and with your javascript. I think you'll be amazed at the time required for the browser's rendering engine to do its work, even in the first case, and more so in the second.

    In other words, while it would be unfair (oversimplified) to assign all of the time required for you to observe the output to the browser, MUCH of that time is consumed by the browser.

      Amazing!! I didn't think of that.

      I saved the html output from one of the databases (22 columns and 749 rows) to a local file. When I opened that file in Firefox, it took almost the same time to load as the script on the server!

      I gotta clean up my html :(

      How can I speed up the browser parsing when I'm not sure how much data and how long the strings of individual fields are going to be?

        On the theory this is something you'll have to do repetitively, get the lengths of the longest string in each column (ONCE) to get a ballpark idea of what you need for screen-realestate. Make your peace with a need to sidescroll to see all the columns. And recognize that this is rough and ready; different data sets could make hardcoding this way a total bust (though, of course you could build the length determination function into your program :-) but that could easily overwhelm the speedup.)

        Say the results come up this way:

        col len
        1:   11
        2:   4
        3:   80
        4:   27

        In total, you need space for 122 chars per row plus whatever the row label will be (if used). Let's assume no label. So now you can set up your table this way:

        <table width="100%"> #other good practices omitted; beyond the scope +here <thead> <tr> <th style="width: 11em;">Col1</th> <th style="width: 4em;">Col2</th> <th style="width: 80em;">Col3</th> <th style="width: 27em;">Col4</th> </tr> </thead> <tbody> <tr> <td (whatever styling you want)...</td> etc.

        The tds will inherit their widths from the ths (unless a td ends up with more chars than expected, in which case, different browsers will react in variant ways).

        Hope this helps, but a solid-er answer is probably, "build a better browser."

Re: Foreach Loop Optimization
by dogz007 (Scribe) on Aug 01, 2007 at 15:37 UTC
    Perhaps the slowness you observe is in memory usage instead of the foreach loops. Instead of loading the entire file up into @project_module_list, read only one line of the file at a time. Replace your outer foreach loop with the following while loop and see if it improves.

    open FH, "<database.txt" or die; <FH>; # ignore header line of CSV file while ( my $project_module_list = <FH> ) { chomp $project_module_list; ## put remainder of code here }
      Thanks, I thought of that but I need to do a lot of sorting, filtering, and other stuff with the table data before I print it so I would have to load it all into memory any way.
        If that's the case then you might try some of the techniques in Dominus's lightweight database tutorials. You can find the link to them in this node:

        http://www.perlmonks.com/?node_id=629253

        I started looking through the materials and I think it may work real well for your situation. However, I do not have much experience in it, so you're on your own from there.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://630084]
Approved by FunkyMonk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (7)
As of 2021-06-22 17:34 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    What does the "s" stand for in "perls"? (Whence perls)












    Results (108 votes). Check out past polls.

    Notices?