Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

Re: No garbage collection for my-variables

by BrowserUk (Patriarch)
on Sep 16, 2008 at 18:01 UTC ( [id://711760]=note: print w/replies, xml ) Need Help??


in reply to No garbage collection for my-variables

Maybe it's time for the fabled use less to allow this memory-for-speed optimisation to be disabled?

That said, most of the types of routines for which this could become a significant problem, things like your examples of encode and decode that take string and return it modifed in some way, ought to be written to use the pass-by-reference aliasing affects of @_ anyway. It would make this 'problem' go away.

Of course, an orthodoxy has grown up around this place that pass-by-reference and side-effects are some how bad karma and that directly accessing @_ is premature optimisation. That modifying your arguments is bad because it is action at a distance that can surprise the caller.

But, as long as subroutines are documented as modifying their argument(s), it really does make the most sense in many cases. The caller knows what subsequent use it will make of the arguments it passes you, and if it needs for them to be preserved, it can make copies as and when it needs to. Which makes more sense than every subroutine, copying every parameter, every time, 'just in case'.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

Replies are listed 'Best First'.
Re^2: No garbage collection for my-variables
by kyle (Abbot) on Sep 16, 2008 at 19:11 UTC

    In addition to moritz's excellent point that a function that modifies its arguments then could not be called with a literal, I'd also point out that a lot of Perl programmers probably don't know that @_ is full of aliases. I'd been programming in Perl off and on for over ten years before I came to the Monastery and learned that @_ is aliases. I've asked about this feature in interviews I've conducted, and the prospects out there have always been surprised at this feature. Documentation helps, of course, but someone who doesn't know this is possible could spend an awful lot of time debugging before discovering this (as you say) action at a distance.

    Thumbs up on the use less, however.

        @a = sort @a is done in place before 5.10

        >perl580\bin\perl -MO=Concise -e"@a = sort @a" 2>&1 | find "sort" 7 <@> sort lK ->8 >perl588\bin\perl -MO=Concise -e"@a = sort @a" 2>&1 | find "sort" 7 <@> sort lK/INPLACE ->8 >perl5100\bin\perl -MO=Concise -e"@a = sort @a" 2>&1 | find "sort" 7 <@> sort lK/INPLACE ->8

        I don't have 5.8.1 to 5.8.7, so let's consult the perldeltas.

        perl584delta:

        In place sort optimised (eg @a = sort @a)

        But it was buggy in 5.8.4. perl585delta:

        The in-place sort optimisation introduced in 5.8.4 had a bug. For example, in code such as @a = sort ($b, @a), the result would omit the value $b. This is now fixed.

        It's the same mechanism that sort uses for in-place sorting in 5.10. I've thought about patching List::Util::shuffle() in the same way.

        I thought what you meant was that a subroutine can detect its being called in void context? But if I am not mistaken, 5.10's in-place sorting does not happen in void context:

        % perl5.10.0 -lwe '@a = (2,1); sort @a; print @a' Useless use of sort in void context at -e line 1. 21

        As I understand it, perl (5.10) will detect that in @a = sort @a, the destination array is the same as the source array, so it uses a more efficient algorithm (but it's still in list context).

      that a function that modifies its arguments then could not be called with a literal

      There are edge cases. See foreach funny business.

Re^2: No garbage collection for my-variables
by moritz (Cardinal) on Sep 16, 2008 at 18:55 UTC
    There's much more perlish reason not modify the arguments of sub by default. If you don't, you can write stuff like this:
    other_function(decode 'latin-1', 'string_literal')) # and if you want to change a variable $var = decode('latin-1', $var);

    On the other hand if you do change the the arguments of the sub, the first one requires another variable, which is a real kludge (visually, at least)

    do { my $var = 'string_literal'; decode('latin-1', $var); other_function($var); } # and the other one decode('latin-1', $var)

      I think that you've overplayed the case. Using a do block instead of an anonymous block makes it look more complicated than it is.

      Even wrapping a local var in a bare block is rarely necessary. Most code is nested at some level in a if or while or other loop block or subroutine body.

      On the rare occasions that it is at the top level of a program or module, if you really want it to be garbage collected, undef is better (in that it will actually achieve something) anyway.

      Even the use of a constant is a emphasising the rare case. Mostly data is read in from external sources and is in a variable already, so:

      while( my $var = <$fh> ) { mutate( $var ); use( $var ); }

      is hardly onerous, but even that can be avoided. Thanks to perl's context sensitivity, you can have the best of both worlds. For the simple case, subroutines behave as passthru pass-by-value, but when the need arises to minimise memory allocation and copying, using it ina void context does the right thing:

      #! perl -slw use strict; sub mutates { my $ref = defined wantarray ? \shift : \$_[ 0 ]; $$ref =~ s[(?<=\b[^ ])([^ ]+)(?=[^ ]\b)][scalar reverse $1]ge; return $$ref if defined wantarray; return; } sub doSomething { print shift; } doSomething( mutates( 'antidisestablishmentarismania' ) ); my $var = 'The quick brown fox jumps over the lazy dog'; mutates( $var ); doSomething( $var ); __END__ c:\test>junk ainamsiratnemhsilbatsesiditna The qciuk bworn fox jpmus oevr the lzay dog

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        Thanks to perl's context sensitivity, you can have the best of both worlds. For the simple case, subroutines behave as passthru pass-by-value, but when the need arises to minimise memory allocation and copying, using it ina void context does the right thing

        Now that's surely tempting, but could lead to rather odd situations:

        sub do_stuff { ... do_other_stuff($variable); # remove that debugging statement, and do_other_stuff # will behave very differently if do_stuff is not # called in void context print "still here\n"; }

        Admittedly that's a fairly artificial situation and won't show up in real code very often, but if it does it's very nasty to debug.

        Designing interfaces around performance optimizations and memory management oddities just doesn't seem right to me.

      Agreed a thousand times over. If I had a penny for every time I'd been forced to write tedious and ugly code because chomp modifies its argument instead of returning the chomped version, I'd have several pennies.
        Are
        chomp( my $var = <$fh> );

        and

        chomp( my $dst = $src );
        really more tedious and uglier than
        my $var = chomp( scalar ( <$fh> ) );

        and

        my $dst = chomp( $src );
Re^2: No garbage collection for my-variables
by betterworld (Curate) on Sep 16, 2008 at 18:51 UTC
    Maybe it's time for the fabled use less

    Good point. Maybe there just isn't a way for perl to detect how a particular variable could be optimized, but it would be possible if the user could decide.

    things like your examples of encode and decode that take string and return it modifed in some way, ought to be written to use the aliasing pass-by-reference aliasing affects of @_ anyway.

    Unfortunately I don't think it's realistic to demand that all modules be written this way. In the case of Encode, I'd rather use the module than my own memory-conserving code; and it's not convenient to change the module's source code. (I would probably even have to change it if "use less" worked, because it's lexically scoped afaik.)

    (However I could encode the text line by line as Joost suggested.)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://711760]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-19 20:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found