Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Garbage collection at subroutine return

by tcarmeli (Beadle)
on Feb 15, 2007 at 13:19 UTC ( #600198=perlquestion: print w/replies, xml ) Need Help??

tcarmeli has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
Please look at the following short example.
It measures (hopefully...) the time it takes perl to return from the DoIt subroutine. If you scale the value of $max, the time will expand linearily (well, at least on ActivePerl 5.8.8 on XP OS).

Is there any way to reduce this time?

For those that needed more explanations:
This is the problem re-defined:
When calling a sub that uses large data structures (no matter what it does), what general measures one can to reduce the effect of the garbage collection.
For a good answer, see liverpole's below.
#!/usr/bin/perl -w my $func_done; my $func_return; my $max=1000000; DoIt($max); $func_return = (times)[0]; printf "Return took %.4f CPU seconds\n", $func_return - $func_done; sub DoIt{ my $limit=shift; my $i; my %myhash; foreach $i (0..$limit){ $myhash{$i}=1; } $func_done = (times)[0]; return; }

Replies are listed 'Best First'.
Re: Garbage collection at subroutine return
by liverpole (Monsignor) on Feb 15, 2007 at 13:38 UTC
    Hi tcarmeli,

    Sure, why not declare the hash outside of the subroutine, so that its scope doesn't cause it to be garbage-collected?  You're obviously not using the hash anyway:

    #!/usr/bin/perl -w use strict; use warnings; my $func_done; my $func_return; my $max=1000000; my %myhash; DoIt($max); $func_return = (times)[0]; printf "Size of myhash is %d\n", 0 + keys %myhash; printf "Return took %.4f CPU seconds\n", $func_return - $func_done; sub DoIt{ my $limit=shift; my $i; foreach $i (0..$limit){ $myhash{$i}=1; } $func_done = (times)[0]; return; }

    But it sounds like you may have an XY problem.  What is your real goal here?

      This is actually an excellent solution I somehow ignored, as I tend to avoid global variables as a 1st instinct.

      For the other questions and remarks:
      I am not suprised by this phenomena, although I did not imagine such a penalty.
      I use a sub that uses some huge data structures and is being invoked quite a lot. This code is of course just a example of the principle.
        Hi tcarmeli,

        You don't need to have it global, just have it scoped outside the sub.

        For example:

        { my %myhash; sub DoIt{ my $limit=shift; my $i; # really, what you'd do here is just init the # as-yet uninitialized part foreach $i ((scalar keys %myhash)..$limit){ $myhash{$i}=1; } $func_done = (times)[0]; return; } } # scope for %myHash
        Of course, I haven't tried this, but I think it'd do what you want, and handle the case where it's called twice with different values of $limit.


        You might consider using OO techniques to "hide" the global. Make the global into a data member (bless it) and the sub into a method. Of course you then get hit with access costs so you may not end up with a performance win. However OO can help refactor code and that may be a key to unlocking the real problem.

        DWIM is Perl's answer to Gödel
Re: Garbage collection at subroutine return
by osunderdog (Deacon) on Feb 15, 2007 at 13:29 UTC


    *sigh* What constraints on optimization to you have?

    • More CPUs/bigger hardware
    • Threading
    • Client Server (Message to other machine that runs program that performs operations on a bigger box.)
    • Dropping to C for intensive operations. (Using XS to wrap the C into a callable perl function.)
    • Optimizing assignments (Only assign a value when necessary.)
    • Optimize loop size (Only loop the number of times necessary.)

    Any and all of these are possible ways to optimize your program. Which ones are possible in your environment?

    Hazah! I'm Employed!

Re: Garbage collection at subroutine return
by johngg (Canon) on Feb 15, 2007 at 14:37 UTC
    Since the subject of globals and scoping has come up I though I would point out a possible scoping issue with the $i variable in your code and liverpole's response. The $i inside the foreach loop is not the same one that you declared with the my $i; outside of it. The following code illustrates this.

    use strict; use warnings; my $limit = 5; printLimit($limit); sub printLimit { my $limit = shift; my $i = q{xyz}; print qq{outside foreach: $i\n}; foreach $i ( 1 .. $limit ) { print qq{ inside foreach: $i\n}; } print qq{outside foreach: $i\n}; }

    Here's the output

    outside foreach: xyz inside foreach: 1 inside foreach: 2 inside foreach: 3 inside foreach: 4 inside foreach: 5 outside foreach: xyz

    You have to introduce $i with a my somewhere since we are running with use strict; so, since it is only being used inside the loop, why not declare it there?

    foreach my $i ( ... ) { ... }

    I hope this is of interest.



Re: Garbage collection at subroutine return
by Fletch (Chancellor) on Feb 15, 2007 at 13:39 UTC

    Buy a faster computer. Don't make huge lexically declared variables.

    You're asking Perl to do more and more, of course it's going to take longer and longer. Each occupied slot in the HV* structure is pointing to a scalar SV* instance, each of which has to be freed.

Re: Garbage collection at subroutine return
by izut (Chaplain) on Feb 15, 2007 at 14:51 UTC

    I have nothing to add to liverpole's suggestion, but the way you're doing your benchmark test. There's a Benchmark module available, which is really easy to use. If you write a benchmark code with it once, you'll never write a time calculation again. It is useful to compare results too.

    Here is a code that compares liverpole's suggestion and the way you tried it first:

    use strict; use warnings; use Benchmark qw(:all); my $limit = 1_000; sub do_it_tcarmeli { my %do_it_tcarmeli; foreach my $i ( 0 .. $limit ) { $do_it_tcarmeli{$i} = 1; } return; } my %do_it_liverpole; sub do_it_liverpole { foreach my $i ( 0 .. $limit ) { $do_it_liverpole{$i} = 1; } return; } cmpthese( -10, { do_it_tcarmeli => \&do_it_tcarmeli, do_it_liverpole => \&do_it_liverpole, } );

    Igor 'izut' Sutton
    your code, your rules.

      Ah, but you're measuring the total run time of the subs. That's probably going to swamp out the effect of garbage collection on return. tcarmeli took care to measure only the return time. Benchmark doesn't give much support for that kind of timing.

      Oh, and "code" (in the sense of program text) is a mass noun and doesn't take an indefinite article.


Re: Garbage collection at subroutine return
by Joost (Canon) on Feb 15, 2007 at 15:38 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://600198]
Approved by graq
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others scrutinizing the Monastery: (2)
As of 2021-01-20 06:36 GMT
Find Nodes?
    Voting Booth?