Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Playing with non-localized $_ in nested loops.

by davido (Cardinal)
on Aug 22, 2004 at 16:47 UTC ( [id://384949]=perlquestion: print w/replies, xml ) Need Help??

davido has asked for the wisdom of the Perl Monks concerning the following question:

I just love this kind of thing because I have to think my way through unravelling the chain of events leading to a funky warning. Yesterday I was reading Modification of a read-only value, and specifically Aristotle and ysth's comments, and that got me to thinking...

I guess in the back of my mind I always knew that reading from a filehandle implicitly into $_ doesn't localize $_. That fact didn't really ring any alarms in my mind with respect to how that would affect aliasing of $_ in foreach loops, probably mostly because I almost always name my iterator with a lexical, and almost always name the recipient variable of filehandle reads, so I really never run the risk of this becoming a problem. But I wanted to experiment anyway.

The background: Assume you've got a foreach loop set up, and you haven't specified an iterator. That means the iterator will be $_ implicitly. $_ will be aliased to each element (one at a time) of whatever list you're iterating over. Fine. Now assume that within that foreach loop, you're reading from a filehandle, and you're not specifying a named recipient variable (in other words, you're not saying my $line = <FH>;..., you're just saying <FH> and letting Perl plop the line into $_. Is this starting to raise any red flags? It should, of course.

But I couldn't leave well enough alone, and decided to do just that to see if the behavior was what I expected. ...my expectations aren't always dead on target when dealing with some of the less apparent constructs, so a little test was in order.

Consider the following code:

use strict; use warnings; my ( @array ) = qw/ one two three four /; my $position = tell DATA; print "\@array holds @array.\nPrinting DATA a few times:\n"; foreach( @array ) { while ( <DATA> ) { chomp; print $_, "\t"; } print "\n"; seek DATA, $position, 0; } print "Array: @array\n"; __DATA__ a b c

Now it doesn't surprise me that I got a warning or four running this script. But what does surprise me is the warning I got:

@array holds one two three four. Printing DATA a few times: a b c a b c a b c a b c Use of uninitialized value in join or string at mytest.pl line 20, <DA +TA> line 12. Use of uninitialized value in join or string at mytest.pl line 20, <DA +TA> line 12. Use of uninitialized value in join or string at mytest.pl line 20, <DA +TA> line 12. Use of uninitialized value in join or string at mytest.pl line 20, <DA +TA> line 12. Array:

Lets unravel what's going on here...
First, we assign the values 'one', 'two', 'three', and 'four' to @array. Next, we iterate over @array, using $_ as the implicit aliased iterator. Within each iteration we read from <DATA> until there's nothing more to read. On each iteration, $_ is assigned the contents of the filehandle read. Remember, $_ is aliased back to an element of @array on each iteration of the foreach loop. So reading into $_ is going to modify elements of @array. After the last iteration of the while loop, <DATA> is checked one more time for stuff, and nothing is found, so undef is returned. $_ will be assigned undef on the final conditional check for the while loop (I think), and that means that for each iteration over the elements of @array, that element will receive a value of undef.

That's what I expected. What I didn't expect is that the warnings I got would refer back to <DATA> when I print @array. Yes, we're trying to join four uninitalized values (actually four undefs, but that's the same thing). But why is the warning reporting that I'm on <DATA> line twelve? The print statement didn't perform a read on <DATA>. The only thing I can think is that warnings might always specify what line of the most recently used, still opened filehandle is the current file line, even if the event triggering the warning didn't just read from the filehandle.


Dave

Replies are listed 'Best First'.
Re: Playing with non-localized $_ in nested loops.
by gaal (Parson) on Aug 22, 2004 at 22:21 UTC
    Regarding what surprised you, a comparison of the output of these two one-liners affirms your conjecture. Neither of the following warns directly about a read:

    % perl -le 'warn "a"' a at -e line 1. % echo "" | perl -le '<>; warn "a"' a at -e line 1, <> line 1.
    Once a <> has been invoked, warnings will include latest readline coordinates regarless of what triggers them.

    By the way, this issue has bitten real live sites. I gave a lightning talk in YAPC::Israel::2004 about this.

      You'll only get the extra info if the filehandle read from is still open. closing the file (ARGV in the case of the diamond operator) switches it off again.

      P:\test>echo "" | perl -wle "<>; warn 'Erk!'" Erk! at -e line 1, <> line 1. P:\test>echo "" | perl -wle "<>; close ARGV; warn 'Erk!'" Erk! at -e line 1.

      Explicitly closing DATA once the loop completes is useful sometimes.


      Examine what is said, not who speaks.
      "Efficiency is intelligent laziness." -David Dunham
      "Think for yourself!" - Abigail
      "Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon

      gaal,
      I read the slides from your lightning talk. They were interesting. One thing that I might add (or otherwise comment on) is the suggestion that while() doesn't localize $_. While this is true, it's probably more accurate to state that <FH>; doesn't localize $_. Because the fact is that while() isn't itself acting on $_. The action is carried out by the diamond operator implicitly spilling its guts into $_. while() doesn't really have much to do with it, though we get in the habbit of thinking that it does since the while( <> ) construct is so common, and since foreach loops default to acting upon $_.

      Also commenting on your slides (not having heard the lecture), foreach is the exception as you said, but this is because foreach works in an entirely different way than while(). With respect to foreach, it is the loop construct itself that acts upon $_, in a very special way. We even write it in a very special way (if we write it out longhand):

      foreach $_ ( list )

      ...that's how foreach deparses with B::Deparse. Your slides mention that foreach is the exception. While it is an exception, it is not the only one. map also is a looping mechanism where $_ is localized. Consider the following code:

      $_ = "Test string\n"; my @array = map { $_ = chr $_ } 32 .. 64; print $_; __OUTPUT__ Test string

      As you can see, the use of $_ inside of map works a lot like the iterator of a foreach loop, in that it serves as an alias to the elements of the input list (ok, my example doesn't demonstrate this, but it's true), and in that it is localized (my snippet shows this to be the case).

      Just a few observations and additional meditations... ;)


      Dave

        The automatic setting of $_ comes from the combination of while() and readline. Just a plain readline won't do it. Nor does while with just any operator do the implicit assignment.
        for, map, and grep: these three constructs localize $_. But in fact, map and grep are more vulnerable than foreach, because they don't give you the choice of using a lexical as your aliased iterator (iterant?). I did mention this in the talk though I guess the emphasis is stronger when you have more than the slides :)
Re: Playing with non-localized $_ in nested loops.
by ysth (Canon) on Aug 23, 2004 at 01:48 UTC
    The extra info is included whenever perl's last input filehandle is set (by readline, eof, tell, or sysseek) and $. is true. close resets $., and so will supress the info. I'm trying to come up with an harmless way of suppressing it in general; the best I can do is:
    { local $.; scalar tell STDOUT; warn "whatever"; }
    Localizing $. doesn't do what you might expect; instead of doing anything to the actual value of the line number of the current input file, it only sets up the local input filehandle itself to be restored at the end of the block.

      I've often wished that $. could be %. instead. That way, $.{FH} could refer to <FH>, and $.{DATA} could refer to <DATA>. But this all breaks down when you start talking about lexical scalar filehandles ( <$fh> ), because how would you write that? $.{\$fh}? ...it starts looking a little ackward.

      I've also often wished that it would actually "do" something when you assign a number to $. But of course this is asking way too much of the buffered IO subsystem, and when that sort of behavior is desired, it's almost always easier to just use Tie::File.


      Dave

        I've also often wished that it would actually "do" something when you assign a number to $.
        Just think what this would mean:
        $. = 10.7;
        Does that mean we can now start reading from the character at position 70% on line 10? Or is that the 8th character? [And how do I get rid of the headache that just gave me?]

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

Re: Playing with non-localized $_ in nested loops.
by melora (Scribe) on Aug 23, 2004 at 15:37 UTC
    This is highly educational for me, thanks very much for posting it. I must admit that I use $_ very sparingly, partly because I don't understand such things very well. I think it's also because of my longstanding habit of being very explicit (C, need I say more?). I'm still getting used to lines like
    while (<>) { chomp; print $_, "\n"; }
    because my old habits say "Where did $_ come from???" Going through such exercises as the above is helping me learn better, more Perlish ways.
    I think it's fun and worthwhile to play with such things. And I'm not surprised at being confused on a Monday morning.
      Don't be too quick to knock your approach. I find it useful in "real" programs to use a named variable to indicate to my reader (me in most cases :) what I expect that variable to represent. This is especially useful in nested constructs where $_'s semantics change as construct boundaries are crossed.

      And if I change a foreach (...) to a while (...), the magic of $_ disappears, and I might introduce bugs similar to gaal's example.

      Sometimes using explicit variables helps you find logic errors too.

      -QM
      --
      Quantum Mechanics: The dreams stuff is made of

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://384949]
Approved by bmann
Front-paged by Arunbear
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others sharing their wisdom with the Monastery: (2)
As of 2024-04-19 21:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found