moof1138 has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks,
I have a script in which I have a few arrays of about 100 strings, each string being about 200 characters long, give or take. When I wrote this thing I declared them in the main scope, even though some arrays are only really used by one function, assuming that it was better loading them once right at the beginning, since otherwise they would be pushed into memory anew each time the function was called, slowing things down a little. Now I wonder if this might actually not be the case. Is it better to put the arrays in the scope of the function that is using them, leave them scoped as they are now, or does it matter at all, and why? Any thoughts appreciated.
Re: scoping large arrays - newbie Q
by perrin (Chancellor) on Jun 18, 2002 at 05:36 UTC
|
Why not benchmark it and see? Use the Benchmark module. One thing you may not know is that the memory used by lexical variables is not freed when they go out of scope. Perl keeps it allocated in case you use the same lexical again. | [reply] |
|
| [reply] |
Re: scoping large arrays - newbie Q
by samtregar (Abbot) on Jun 18, 2002 at 04:11 UTC
|
It's a little hard to know exactly what you're talking about since you didn't include any code. But my guess is you need to learn about references. With references you can create an array in one function and pass it to another without paying the penalty to recreate it. This is known as "pass-by-reference" to comp-sci geeks. Here's an example:
sub foo {
my @array = ( 0 .. 100 ); # create a new array
bar(\@array); # pass it to bar() by reference
}
sub bar {
my $array_ref = shift; # get reference to foo()'s array
foreach (@$array_ref) { # print out each value
print;
}
}
If this is your first encounter with references then you've got some learning to do. I suggest you pick up a copy of Learning Perl or Programming Perl and dig in!
-sam
| [reply] [d/l] |
|
Thank you for the response. I do know about using references, I have used them here and there to pass something to a function that was not in that function's scope. That's not really what I am looking for here. I was not clear enough. I just really wonder whether it is more efficient to set up my array at point A, or point B in the pseudocode below which is trying illustrate my question:
#!/usr/bin/perl -w
use strict;
my @bigarray = ("insert", "a very", "long list here");#point A
my $thing1 = mySub();
my $thing2 = mySub();
#...etc - using mySub repeatedly
sub mySub {
#point B - should I declare my @bigarray here instead, and why?
for (@bigarray){
#do stuff with array
}
}
| [reply] [d/l] |
|
With the example you have given,
i say point A, because if you declare your array
at point B, you will re-declare it as many times as you
call mySub(). Also, consider passing @bigarray to mySub()
as a reference. Just be sure to declare mySub() before
@bigarray, otherwise @bigarray is accessible by
mySub():
use strict;
sub mySub {
my $ref = shift;
for (@$ref) {
#do stuff with array
}
}
my @bigarray = ("insert", "a very", "long list here");
my $thing1 = mySub(\@bigarray);
my $thing2 = mySub(\@bigarray);
If the array in question is only pertinent to the
subroutine, and either that sub will only be called once
or the array will change with each sub call, then declare
the array inside the subroutine.
jeffa
L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)
| [reply] [d/l] |
|
Can I choose point C? Here's an alternative:
{
my @bigarray = ("insert", "a very", "long list here"); # c
sub mySub {
for (@bigarray){
# do stuff with array
}
}
}
This keeps @bigarray private to the subroutine but only initializes it once. You get the best of both worlds at no added cost!
-sam
| [reply] [d/l] |
|
Re: scoping large arrays - newbie Q
by Aristotle (Chancellor) on Jun 18, 2002 at 04:39 UTC
|
Generally, from what I've seen so far Perl is rather good at managing memory efficiently and quickly.
Having the array persistent may speed things up if the function is being called ten thousands of times, but it also is an invitation to hogging memory, not to mention global variables just generally tend to lead to headaches. Creating a lexically scoped array a new slows things down a wee notch, but keeps the code clean and the memory footprint lean.
We're only talking 20,000 bytes of data here, that's not something I'd consider much of an array.
Two options you have if you really need the performance is passing around references instead, to keep things properly scoped without the overhead of array allocation; or using a closure like so:
{
my @not_global;
sub operate_on_not_global {
@not_global = whatever();
}
}
Bottome line: personally, I always take the defensive approach - I can always add dirty tricks later if my clean code is not fast enough, but if I start out dirty at the scratchpad, I'll never get a handle on things and end up with a big ball of mud.
Makeshifts last the longest. | [reply] [d/l] |
|
Thank you Screamer, that confirms what I had been getting the nagging feeling about. I had been leery of the globally scoped vars, since I would not consider using them in other languages. While I was wondering about the question in general, after I though about it for a minute, I realized that performance is not really an issue for this project - it is just an AIM bot, it winds up needing to sleep at times to keep it from hitting the AIM flood control limits anyway. I just reworked it scoping all arrays proper to their functions, and I don't feel any performance difference, really.
| [reply] |
|
|