http://www.perlmonks.org?node_id=967277

balakrishnan has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I have planned to improve the performances in my perl scripts. I just surfed in the net about it but I want some more clear information about the areas where we can apply optimization which will impact considerably in performances.

Looking for your quick responses.

Thanks,
Bala
  • Comment on Performance improvement in perl scripts

Replies are listed 'Best First'.
Re: Performance improvement in perl scripts
by moritz (Cardinal) on Apr 26, 2012 at 09:27 UTC

    There's some perlperf documentation on that shipped with Perl, and if you search for "optimize" or "performance" here on perlmonks, you should find lots of results.

    The most important bits are: 1) first make things correct, then optimize. Ensure that your code is still correct after the optimization. 2) measure before and after. 3) use profiling to find out which parts of the code to optimize. Devel::NYTProf is a great profiler.

Re: Performance improvement in perl scripts
by davido (Cardinal) on Apr 26, 2012 at 09:29 UTC

    Profile, then optimize. Devel::NYTProf is sort of the big guy on the block for profiling in Perl.

    Once you've identified the problem areas, that's where the real programming begins. Remember:

    If your code is too slow, you must make it faster. If no better algorithm is available, you must trim cycles.

    -- Tom "Duff's Device" Duff


    Dave

      You're not following the advice you're quoting!

      Tom starts off with If your code is too slow. This part is the most significant.

      If your code isn't too slow, there's no need to profile or optimize.

Re: Performance improvement in perl scripts
by BrowserUk (Patriarch) on Apr 26, 2012 at 11:47 UTC
    I have planned to improve the performances in my perl scripts.

    Simple. Have them do less and they'll run faster.

    I know that sounds cynical, but it isn't. It is a fact of record that when analysing bespoke business software for performance, anything from 40% to 80% of the cycles used by many of those applications, are spent doing non-business critical tasks.

    The number 1 culprit is the production of elaborate, detailed and huge log and trace files. Files which cost the business in cpu cycles, storage space, backup and recovery processes and procedures, and which are frequently either: never accessed after they are produced; or only accessed in order to produce 'pretty picture' reports for MIS brochures of little or no value.

    The number 2 culprit is the overuse of OO technologies -- intended to promote and simplify reuse -- for critical path elements of software systems that not only never have been reused, but on further analysis, never could be.

    When analysing software for performance, look at each subsystem, step and line and determine if it actually contributes directly to the final result or outcome of the process. If not, ask the question: Why is it here? And if the answer starts with some variation of 'If ...'; consider it seriously for removal.

    Once you're reduce the programs to doing only that which is actually necessary, then re-profile its runtime performance and only then decide if it is necessary to improve its performance further.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: Performance improvement in perl scripts
by JavaFan (Canon) on Apr 26, 2012 at 10:26 UTC
    I want some more clear information about the areas where we can apply optimization which will impact considerably in performances.
    Eh, do you really expect useful answers? I can easily say that regular expressions with lots of backtracking are slow, and that there may be lots of improvement here, but that doesn't help you at all if you don't have any regular expressions, or if you are solving a problem that requires a lot of backtracking to begin with.

    What you need to do is not only measure, but also to define goals. When is your program "good enough"? If you have a cron job that does a nightly job starting at 2 AM, and runs for an hour when noone is waiting for the results, there's unlikely to be a reason to optimize, even if you can bring the run time down to 1 minute.

    OTOH, if you're selling on a website, and you notice that people surf away if rendering a page takes more than X seconds, saving off a few microseconds in a hot loop may have a noticeable effect in the number of sales you make. Or maybe not. You should measure to be sure.

Re: Performance improvement in perl scripts
by cavac (Parson) on Apr 26, 2012 at 13:15 UTC

    Note: This is my completely personal view. If you don't agree, i fully understand. In my opinion, program optimization is just as much art as it is technical. And art is in the eye of the beholder...

    That really, completely depends on the problem your program is trying to solve and the way you implemented it. Not for every problem, "Code optimization" is really the best choice. Personally, the first thing i do is think about the design choices i make/made and try to see if there are better ones.

    For example:

    • If you are working with huge amounts of data in text files, you might consider switching to a modern open source database instead. Depending on your requirements (relations/normalized data or simple Key/Value store) you'll have to pick an appropriate database system.
    • If you do lots of complicated math, you might consider switching parts of your program to C/XS
    • If you do rather simple math operations, but the same ones on lots and lots of datasets, you might think about learning CUDA or OpenCL
    • If you do lots of network stuff (especially server side), have a look at the multithreaded and/or preforking alternatives line Net::Server
    • If you to many external program calls with backticks or similar, you have a basic design flaw. Try to find some Perl-internal functions or fitting CPAN modules.
    • If you always call the same external program (and you can't find something that fits on CPAN), look if that programs functions also come as a library. Maybe there's a way to use XS to make a Perl module. This would save a lot of external processes being created/destroyed and would make execution much faster.
    • Take a look at your computers harddisk and/or network LED's. Are they (nearly) constantly on when your program is running? If yes, this is a sure sign that your program is IO bound (e.g. it's performance is limited by IO rather than processing speed). In case of harddisk IO, get faster disks into a Raid10 array and or make a better file format with more compact data representation. If network bound, well get a faster link if possible and try to minimize protocol overhead.

    I could probably continue the list for another 50 to 200 items, but you get get the basic idea.

    In short: In-depth source code optimization should be one of the last steps. First, check that the basic program design is sound and you are not constrained by external limitations. If everything else checks out, then you go fiddling with algorithms, re-ordering loops and making unreadable code...

    "You have reached the Monastery. All our helpdesk monks are busy at the moment. Please press "1" to instantly donate 10 currency units for a good cause or press "2" to hang up. Or you can dial "12" to get connected directly to second level support."
Re: Performance improvement in perl scripts
by sundialsvc4 (Abbot) on Apr 26, 2012 at 11:58 UTC

    The “80/20 rule” applies all over the place when it comes to talking about computers:   such as, “80% of the time is spent in 20% of the code,” or, “80% of the code is [almost] never used anyway,” and so on.

    Having decided upon a reasonably efficient algorithm, for what the program actually needs to do (and no more...), measure to see the particular spots where the program is running both frequently and slowly.

    Human-perceptible slowdowns are almost always going to involve input/output:   either inefficient operations against actual files, or, perhaps more likely, wasteful use of memory which causes thrashing.

    Performance improvements are usually trade-offs:   you get something, you pay something.   That decision will be yours.

    Finally, there is this august maxim from the venerable Elements of Programming Style:   don’t “diddle” the code to make it faster .. find a better algorithm.   I suggest that this notion should lead you very frequently to CPAN; to the assumption that a “better algorithm” probably exists, and that a CPAN author probably found it.

    actually
      to the assumption that a “better algorithm” probably exists, and that a CPAN author probably found it.
      Maybe. But do realize that something being on CPAN only means that a author found a algorithm. It may be correct, it may not be. It may be better, it may not be. The more times a wheel has been reinvented, the chance one of the authors has a better algorithm increases. It's also going to be harder to pick the better one (as there will be more choice).

      Do also note that many CPAN authors try to be "complete", and engineer their provided solution to be as general as possible. That usually comes with a price. In fact, if performance really matters, not using CPAN is often the better solution: now you can create something that's tailor made for your environment, and you don't have to pay a price for being more general.

        Even when we choose not to use the module, the time spent looking for it is seldom wasted. Module documentation often reveals special cases or other aspects of the problem which we had not thought of.

        True, modules may not be correct. The same can be said for own code (especially code optimized to a different standard). At least, the module has withstood the criticism of a fairly large base of users.