http://www.perlmonks.org?node_id=630947

This tutorial will cover the basics of profiling your code. I will tell you how to get profiling information and how to interpret it.

Modules you can use for profiling

How to run existing code under the profiler

With Devel::DProf, it's

perl -d:DProf foo.pl

With Devel::Profiler, it's

perl -MDevel::Profiler foo.pl

...or you can "use Devel::Profiler" anywhere in your code.

It's worth noting that the profiler will slow down the code you're running, so expect a slow program to get slower while you're trying to figure out why it's so slow. Also, the raw output files can get very large, especially if your program runs a long time.

How to get the results of profiling

With profiling turned on, a file called "tmon.out" will be created in the current directory. That file contains all the raw timing information for your run. To get a condensed report, run

dprofpp

That will read "tmon.out" in the current directory and give you a summary of profiling information.

Interpreting the results of profiling

Now that we know how to get profiling data, we have to know what it means. I'll take you through a series of examples to show you the meaning of the numbers you get.

The basics

Consider the following program:

main(); sub main { my $now = time(); for( my $x = 0; $x <= 1_000_000; $x++ ) { print scalar localtime, "\n" if $now != time(); $now = time(); } return; }

This is a single call to main(), which loops a million times. On my system, I run this, and I get the following output from dprofpp:

Total Elapsed Time = 1.859881 Seconds User+System Time = 1.389881 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 98.5 1.370 1.370 1 1.3700 1.3700 main::main 0.72 0.010 0.010 2 0.0050 0.0050 main::BEGIN 0.00 - -0.000 1 - - strict::bits 0.00 - -0.000 1 - - strict::import 0.00 - -0.000 1 - - warnings::BEGIN 0.00 - -0.000 1 - - warnings::import

Let's talk about what each of those numbers mean.

Total Elapsed Time
This is wall clock time from the program's start to finish, no matter how that time was spent.
User+System Time
This is how much CPU time the program took. This does not include time spent waiting on the disk, the network, or other tasks. ("User time" is time spent in your code and "system time" is time the operating system spent serving your code.)
%Time
This is the percent of "User+System Time" spent exclusively in the specified function. It does not include time spent in functions that this function called.
ExclSec (exclusive seconds)
Time spent just in this function, not in functions it called.
CumulS (cumulative seconds)
Time spent in this function, including time spent in functions that it called.
#Calls (number of calls)
Number of times the function was called.
sec/call (seconds per call)
The average number of seconds (in user+system time) spent in each call to this function.
Csec/c (cumulative seconds per call)
The average number of seconds spent in each call to this function, including time spent in the functions it called.

Exclusive seconds vs. cumulative seconds

Here's the example program:

main(); sub main { all_the_work() } sub all_the_work { my $now = time(); for( my $x = 0; $x <= 1_000_000; $x++ ) { print scalar localtime, "\n" if $now != time(); $now = time(); } return; }

In this program, main() does nothing but call a single function all_the_work(), which loops a million times. Here's the data:

Total Elapsed Time = 1.809848 Seconds User+System Time = 1.379848 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 98.5 1.360 1.360 1 1.3600 1.3600 main::all_the_work 0.72 0.010 0.010 1 0.0100 0.0100 warnings::BEGIN 0.00 - -0.000 1 - - strict::bits 0.00 - -0.000 1 - - strict::import 0.00 - -0.000 1 - - warnings::import 0.00 - 1.360 1 - 1.3600 main::main 0.00 - 0.010 2 - 0.0050 main::BEGIN

Notice that in the ExclSec column, all_the_work takes most of the time, and main takes so little time, it's not even noted. However, in CumulS, both of them take most of the time. The time spent in all_the_work is included in the cumulative time spent in main.

Elapsed time vs. User+system time

Here's the example:

main(); sub main { do_nothing(); all_the_work(); } sub do_nothing { sleep 5 } sub all_the_work { my $now = time(); for( my $x = 0; $x <= 1_000_000; $x++ ) { print scalar localtime, "\n" if $now != time(); $now = time(); } return; }

In this one, main() calls do_nothing(), which doesn't do any work, but it takes five seconds to finish (using sleep). Then main() calls all_the_work(), which loops a million times.

Here's the data:

Total Elapsed Time = 6.809867 Seconds User+System Time = 1.459867 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 97.9 1.430 1.430 1 1.4300 1.4300 main::all_the_work 0.68 0.010 0.010 2 0.0050 0.0050 main::BEGIN 0.00 - -0.000 1 - - warnings::import 0.00 - -0.000 1 - - strict::bits 0.00 - -0.000 1 - - warnings::BEGIN 0.00 - -0.000 1 - - strict::import 0.00 - -0.000 1 - - main::do_nothing 0.00 - 1.430 1 - 1.4300 main::main

Notice that the elapsed time is a lot higher than the user+system time. Notice also that the elapsed time doesn't show up in any of the other statistics. The numbers for do_nothing do not show that five seconds passed while it was "running."

When there's a large gap between elapsed and user+system time like that, it is often useful to run dprofpp with the -r option. Here's the same raw data used above, but displayed with dprofpp -r:

Total Elapsed Time = 6.799829 Seconds Real Time = 6.799829 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 73.5 5.000 5.000 1 5.0000 5.0000 main::do_nothing 26.1 1.780 1.780 1 1.7800 1.7800 main::all_the_work 0.15 0.010 0.010 1 0.0100 0.0100 warnings::BEGIN 0.15 0.010 0.020 2 0.0050 0.0100 main::BEGIN 0.00 - -0.000 1 - - strict::bits 0.00 - -0.000 1 - - strict::import 0.00 - -0.000 1 - - warnings::import 0.00 - 6.780 1 - 6.7800 main::main

With the -r option, dprofpp reports elapsed times for all functions instead of just the CPU time. With that, it's easy to see that most of the real time was spent in do_nothing().

What causes differences between elapsed time and user+system time? This can be anything that your program waits on when it's not actively running. Typically this is one of:

  • Network connections
  • Disk access
  • Other processes running on the same machine
  • Database access (which is often a combination of the above)

Other results available via dprofpp

The dprofpp tool has options to make its output more useful in different situations. By default it will show you the top 15 functions by user+system time. With the -g option, you can have dprofpp focus on one function. With the -O option, you can have it show any number of functions.

Sorting options

-a
Alphabetical by name
-z
(default) Sort by user+system time used.
-l
Sort by number of calls to the subroutines.
-v
Sort by average time spent in subroutines.

Time measurements to display

-s
Display system times rather than user+system times.
-r
Display elapsed real times rather than user+system times.
-u
Display user times rather than user+system times.

More options

There are also options to show call trees and do other interesting things. See the dprofpp documentation for the full scoop.

Things I haven't covered

Replies are listed 'Best First'.
Re: RFC: Profiling your code.
by perrin (Chancellor) on Aug 07, 2007 at 04:50 UTC
    When profiling anything that spends a lot of time on I/O, like DBI or disk access, you should look at the real time with the -r flag. CPU time will lead you off in the wrong direction when CPU is not the bottleneck.
Re: RFC: Profiling your code.
by aufflick (Deacon) on Aug 10, 2007 at 06:01 UTC
    You don't mention when to use one profiler over another. My rule of thumb is to try Devel::Dprof first. If that core-dumps a few times in a row (it can do that, depending on your code) then I try Devel::Profiler.

    Another module worth pointing out is Devel::FastProf. The two main profilers profile by sub/method call. If you've inherited a nasty monolithic script FastProf will give you profiling by file line.

Re: RFC: Profiling your code.
by t'mo (Pilgrim) on Aug 09, 2007 at 04:31 UTC

    Interesting...I might be able to use this sometime.

    As far as CGI goes, I tried it out on a personal web project and all I had to do was give global write permissions to the directory where Apache ran my scripts (chmod 777 /path/to/my/scripts). Only on a dev machine, of course; I wouldn't want to do that on a production box.

Re: RFC: Profiling your code.
by explorer (Chaplain) on Aug 08, 2007 at 18:42 UTC

    And Memory Profiling?

      And Memory Profiling?

      That's outside the scope of my tutorial, but see Profiling memory and Memory Profiling. The short answer seems to be that there's not a tool to do this.