Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

how to improve the performance of a perl program

by ghosh123 (Monk)
on Aug 05, 2013 at 15:26 UTC ( #1047920=perlquestion: print w/ replies, xml ) Need Help??
ghosh123 has asked for the wisdom of the Perl Monks concerning the following question:

Hi
I have a gui-based tool written in perl tk which can run jobs in thousands. But we are trying to scale up the gui so that it can run nearly 1lac job without getting hung.
The code base is very huge and comprised of near about 60-70 module files. It uses socket connection for inter-process communication and MySQL for storing data. I need to profile this huge perl code base to know what could be the bottlenecks for running lacs of jobs and how can I overcome that ?
Can anybody please suggest me any good mechanism to know the bottlenecks and do profiling. I have come to know about Devel::NYFTProf but not quite able to understand how to use it. The gui has a launching script which in turn calls some more scripts using some modules.
I have come to know of following things but not quite sure how can they be helpful and how to find out his problem in my huge code

1.Avoid->repeated->chains->of->accssors(..) . Instead use temprorary variables.
Question is how come it will help if I avoid repeated chains of function call and use temp variable. Also how can I look in my huge code where all such chain calls are happening?

2. Use faster accessors as
Class::Accessor
-> Class::Accessor::Fast
---> Class::Accessor::Faster
----->Class::XSAccessor

3. Avoid calling subs that don't do anything. How can I detect this ? Any mechanism ?

4. Exit subs and loops early , delay initialization

return if not ... a cheap test...; return if not ... a more expensive test..; my $foo = ..initialization...; ...body of sub routine ...
5. Fixing silly code as below :
return exists $hash{$a}{$key}?$hash{$a}{$key} : undef; return $hash{$a}{$key}; # instead of above

Thanks

Comment on how to improve the performance of a perl program
Select or Download Code
Replies are listed 'Best First'.
Re: how to improve the performance of a perl program
by BrowserUk (Pope) on Aug 05, 2013 at 17:03 UTC
    I have come to know about Devel::NYFTProf but not quite able to understand how to use it. The gui has a launching script which in turn calls some more scripts using some modules.

    What do you not understand about how to use it?

    You probably don't need to profile the gui itself, so how are you "calling" the other scripts from it?

    Use faster accessors

    The fastest accessor is the one you don't call.

    It may go against OO-dogma; but in 95% of cases, there is no good reason to use subroutines to access instance variables from within that class's methods.

    There are three main reasons formally cited for using accessors within a class's methods:

    1. To isolate the rest of that implementation from future substantial changes to the structure and/or layout of the instance data.

      This could only ever become a saving if that data layout changed beyond recognition; and if that happens, the likelihood that you would get away with not also substantially rewriting method code is almost nil.

    2. To provide centralised, single point validation of values assigned to instance variables.

      If your class is even vaguely well designed and written; it should not be possible for internally sourced assignments to instance variables to assign invalid values. Thus, re-validating those internally-sourced values for every assignment is pure overhead.

    3. People often respond to that with: "But what about values that come into a class from outside"?

      And the answer is that external inputs should be validated at the point of transition across the class boundary. Ie. Whenever an externally visible method is called; you should validate its parameters. But external code should never be directly accessing instance data, therefore there should be no such thing as externally visible accessors.

    In short. External code calls methods to perform services, not access internal data. Where service methods accept arguments; those arguments must be validated immediately; and once so validated; any values derived from those arguments that subsequently gets set into instance variables require no further validation.

    Thus, method code can safely directly access instance variables. Which in turn avoids both the overhead of accessor method calls; and centralised re-validation.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: how to improve the performance of a perl program
by tobyink (Abbot) on Aug 05, 2013 at 16:52 UTC

    Re point #1, the idea is to avoid, say:

    if ($thing->position->x == 0 and $thing->position->y == 0) { ...; }

    ... which calls $thing->position twice, with something like this:

    my $pos = $thing->position; if ($pos->x == 0 and $pos->y == 0) { ...; }

    ... which only calls it once.

    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
Re: how to improve the performance of a perl program
by Tux (Monsignor) on Aug 05, 2013 at 15:46 UTC

    Do you consider Devel::NYTProf a viable option?


    Enjoy, Have FUN! H.Merijn

      Yes, I consider so. But I need an example to better understand it. Also can you please explain me how and why my no.1 point which is chain->of->accessors could be a problem ?
      Please also explain how can I use Class::Accessor ?
      Thanks.

        $chain->of->accessors often isn't a problem. However, if you've identified a bottleneck, you may need to consider what work is being done:

        • An inheritance lookup is done to decide which "->of" pertains to object $chain. This is a simple operation, but is more expensive than a direct subroutine call.
        • A subroutine call is executed. This involves pushing the call-frame onto the call-stack, and within the subroutine popping off items from the param stack. This is usually pretty quick, but considerably slower than looking up the value of a variable.
        • The accessor must do whatever work it must do. Perhaps it does no work other than returning a value. Maybe it's computing the 1-Billionth prime newly on each call. The cost depends entirely on how the accessor is implemented.
        • Next an object is returned so that ->accessors may be invoked on it. The return involves popping the current sub off the call-stack.
        • This process repeats for ->accessors.

        If that sounds like a lot of work, you're jumping to conclusions. If you put all that inside of a tight loop, inside of an algorithm that computes the Cartesian product of two human DNA sequences, yes... it's way too much work to be doing inside of a tight loop. If you're diving into that chain of accessors only every so often, then all the object lookup and call-stack work really fades into the background, and you maybe need to just consider how much work the individual accessors are doing internally. But until you've identified bottlenecks, it's a total waste of your time and the salary your employer pays you to just start making untested assumptions about performance, because you could be looking completely in the wrong places.

        As for your question about how to use Class::Accessor, before I explain how, let me ask you why you think you want to use it. Class::Accessor has about as much to do with code speed optimization as cruise controls have to do with drag racing. So if you do understand that Class::Accessor isn't a means to code speed optimization, and you still need it, then I suggest you read its documentation and ask a specific question about it, rather than asking "how to use" it when "how" is demonstrated right in its documentation.


        Dave

Re: how to improve the performance of a perl program
by davido (Archbishop) on Aug 05, 2013 at 19:14 UTC

    1. Ok, this one I discussed in an earlier reply.
    2. Faster accessors.... Do you know which portions of code are causing problems? Faster accessors only make sense if you've got a slow one causing you trouble. And even then, "trimming cycles" is often much less effective than "a more efficient algorithm"
    3. Yes, I would suggest avoiding calling any code that doesn't do anything. Especially if it does nothing in O(n^2) time. ;) How to detect that it's not doing anything? I guess you've got to look at what it's designed to do, ask why that's useful, and if you determine that it's not doing anything useful, stop calling it. A good regression test suite is helpful when doing that sort of refactoring.
    4. Exiting loops early is an optimization on a linear operation. Lazy initialization or lazy evaluation is a technique where you do some expensive work just in time, with the hope that maybe you never have to do it at all. If you know you've got to do it, sometimes an opposite technique of pulling as much of the work into startup time as possible can also be beneficial. Which is best for your application has to be your decision based on a lot of factors.
    5. If you're going to go about rearranging code that does work but just looks silly, especially when it's not really impacting performance, be sure that you've got good regression tests in place first, or just leave it alone.

    Dave

Re: how to improve the performance of a perl program
by Anonymous Monk on Aug 06, 2013 at 02:34 UTC

    Hi,

    Just make sure that the tests are done against real world numbers of the scale you envisage running in reality.

    Good solutions for 100 whatsits can turn into nightmares when run against the real world's 1 million whatsits.

    J.C.

Re: how to improve the performance of a perl program
by sundialsvc4 (Abbot) on Aug 05, 2013 at 18:02 UTC

    After reviewing what you are doing here, and thinking very carefully about it, I am not persuaded that the real problem here will prove to have anything to do with “how many nano-bleems Perl requires to resolve an accessor chain.”   Ditto, any and all of the five bullet-points that you have listed here.   I seriously doubt that “the root cause of the problem” will actually be found to be meaningfully linked to any of these O(nanoseconds) things.   Hence, I doubt that execution-profiling will give you meaningful / useful results.

    Instead, I see that you are using a GUI program to “run jobs in thousands,” and that you are using “MySQL to store data.”   Anything that you request of MySQL, such that it requires a physical I/O operation to perform, is going to make O(many_milliseconds) ... apiece.   Furthermore, you say that you are “using sockets for inter-process communication,” which necessarily means an asynchronous connection, and, I would dare to speculate, likely a home-grown one.   Finally, since the reported symptom is “is hung,” as seen from the point-of-view of an event-driven GUI program, I would peg the true culprit, with almost-100% certainty, as “a timing hole.”

    I would start troubleshooting this problem by adding timestamps for every significant event to the SQL record that describes each unit-of-work:   when was it created, when did it start execution, when did it finish.   Then, I would add a very “chatty” event-log file to both the front-end and the back-end systems to record, with timestamps, exactly what each system did and precisely when it did it.   I would then run the systems for a little while, find a way to merge the various logs together by time, and look specifically for situations where one piece of the system “didn’t get the message.” Or where, some exception to the expected control-flow actually occurred.   I would start with the assumption that the processing sequence isn’t exhibiting piss-poor performance because it is “running out of nanoseconds,” but because it is “getting stuck in traffic” because of a timing-hole that no one had caught ... until now.

    If you are barking up the wrong tree, you will never find the raccoon.

      You're doing it again. Misreading what is written and making it up as you go along in order to peddle your garbage point of view.

      Finally, since the reported symptom is is hung,

      Nowhere does the OP state that his program is or ever has been "hung".

      He says: "But we are trying to scale up the gui so that it can run nearly 1lac job without getting hung."

      He is looking to prevent; not cure.

      And off the back of that one mismalinterpretation; you MAKE UP an entire scenario -- completely absent from the OPs description -- just in order that you can draw your stupidest conclusion to date. Which given your history of stupid conclusions is a real achievement.

      You sir; are a total moron!


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        My goodness ... bigger type-fonts, too?

        Unforrtunately, the OP really doesn’t give a terrible amount of detail .. perhaps more detail will follow?   Nevertheless, what is said about the program is fairly typical:   that it is a GUI program which is described as “a launching script.”   (Thus my first guess:   that a non-GUI implementation of the same process would behave similarly.   I would not spend any time prowling around the guts of Tk looking for any performance improvement.)   Likewise, the program is further described as “using sockets for IPC” and “using MySQL to store data,” both of which could just-as-equally describe roughly a million other production programs on this planet ... nearly all of which (is this one the exception?) are I/O-bound and prone to have logic-holes in how they handle sockets.   It would be a rare bird indeed if, being such a program, it were throttled by nanoseconds.   (And I know perfectly well that you have written just such “birds.”)   Indeed, one could spend many hours “profiling” such a program, chasing nanoseconds, and although one might come up with interesting results, would never hit upon what made such a program “slow.”   And that is why I suggested that this could be the wrong tree; not some secret desire to be (let alone to be called ...) “a moron” in a public place.

        Do I know?   No.   No more than you do.   Maybe the OP will elaborate further.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1047920]
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others surveying the Monastery: (6)
As of 2016-02-06 10:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    How many photographs, souvenirs, artworks, trophies or other decorative objects are displayed in your home?





    Results (224 votes), past polls