|P is for Practical|
Profiling your code.by kyle (Abbot)
|on Aug 20, 2007 at 03:26 UTC||Need Help??|
This tutorial will cover the basics of profiling your code. I will tell you how to get profiling information and how to interpret it.
Modules you can use for profiling
I generally try Devel::DProf first. If it doesn't work (i.e., it crashes), I go to Devel::Profiler.
How to run existing code under the profiler
With Devel::DProf, it's
With Devel::Profiler, it's
...or you can "use Devel::Profiler" anywhere in your code.
It's worth noting that the profiler will slow down the code you're running, so expect a slow program to get slower while you're trying to figure out why it's so slow. Also, the raw output files can get very large, especially if your program runs a long time.
How to get the results of profiling
With profiling turned on, a file called "tmon.out" will be created in the current directory. That file contains all the raw timing information for your run. To get a condensed report, run
That will read "tmon.out" in the current directory and give you a summary of profiling information.
Interpreting the results of profiling
Now that we know how to get profiling data, we have to know what it means. I'll take you through a series of examples to show you the meaning of the numbers you get.
Consider the following program:
This is a single call to main(), which loops a million times. On my system, I run this, and I get the following output from dprofpp:
Let's talk about what each of those numbers mean.
Exclusive seconds vs. cumulative seconds
The next example is meant to show the difference between exclusive seconds and cumulative seconds.
In this program, main() does nothing but call a single function all_the_work(), which loops a million times. Here's the data:
Notice that in the ExclSec column, all_the_work takes most of the time, and main takes so little time, it's not even noted. However, in CumulS, both of them take most of the time. The time spent in all_the_work is included in the cumulative time spent in main.
Elapsed time vs. User+system time
Here's the example:
In this one, main() calls do_nothing(), which doesn't do any work, but it takes five seconds to finish (using sleep). Then main() calls all_the_work(), which loops a million times.
Here's the data:
Notice that the elapsed time is a lot higher than the user+system time. Notice also that the elapsed time doesn't show up in any of the other statistics. The numbers for do_nothing do not show that five seconds passed while it was "running."
When there's a large gap between elapsed and user+system time like that, it is often useful to run dprofpp with the -r option. Here's the same raw data used above, but displayed with dprofpp -r:
With the -r option, dprofpp reports elapsed times for all functions instead of just the CPU time. With that, it's easy to see that most of the real time was spent in do_nothing().
What causes differences between elapsed time and user+system time? This can be anything that your program waits on when it's not actively running. Typically this is one of:
Other results available via dprofpp
The dprofpp tool has options to make its output more useful in different situations. By default it will show you the top 15 functions sorted by user+system time. With the -g option, you can have dprofpp focus on one function. With the -O option, you can have it show any number of functions.
Time measurements to display
There are also options to show call trees and do other interesting things. See the dprofpp documentation for the full scoop.
Things I haven't covered