Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

regexp list return 5.6 vs 5.8

by Sixtease (Friar)
on Jan 24, 2008 at 06:50 UTC ( #663945=perlquestion: print w/ replies, xml ) Need Help??
Sixtease has asked for the wisdom of the Perl Monks concerning the following question:

Hello everybody,

Please, can there be a difference between what the following snippet would return on perl5.8 and perl5.6? I'm getting test failures from cpan testers (God bless them) on perl5.6, where it seems to return 1 where $_[0] has no digits.

my @rv = $_[0] =~ /^([0-9]+)$/; return @rv[0 .. $#rv];
use strict; use warnings; print "Just Another Perl Hacker\n";

Comment on regexp list return 5.6 vs 5.8
Download Code
Re: regexp list return 5.6 vs 5.8
by Anonymous Monk on Jan 24, 2008 at 07:33 UTC
    i don't have access to 5.6 at the moment and so cannot investigate your question, but just as a matter of curiosity, why use the expression  return @rv[ 0 .. $#rv ]; when it seems to me it will return exactly the same thing as  return @rv; and when the latter expression does not involve the behavior of the  .. range operator in the case where the terminal value is less than the initial value, i.e.,  0 .. -1 in the case of  $#rv when the array  @rv is empty?

    (you haven't been messing with  $[ have you?)

      It's not quite the same ^^. @rv is an array and in scalar context evaluates to the number of the emelents. Slicing turns it into a list and thus gives the last element instead.

      use strict; use warnings; print "Just Another Perl Hacker\n";
        ETOOMUCHMAGIC. If that is what you want, state it:
        return wantarray ? @rv : pop @rv;

        Is the behavior of list slice subroutine return an intended feature, is it specced somewhere (or even documented)? Or is it just an implementation detail which might some day be considered a bug and be changed thereafter?

        Don't golf production code ;-)

        --shmem

        _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                      /\_¯/(q    /
        ----------------------------  \__(m.====·.(_("always off the crowd"))."·
        ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
      hmmm...

      ok, but then wouldn't it be better (in the sense of being more maintainable) to write something like

          return wantarray ? @rv : $rv[-1];  

      ?

Re: regexp list return 5.6 vs 5.8
by hipowls (Curate) on Jan 24, 2008 at 09:02 UTC

    Running this script on Solaris

    #!/usr/local/bin/perl -w use strict; print "[", scalar test('abc'), "]\n"; print "[", test('abc'), "]\n"; print "[", scalar test('123'), "]\n"; print "[", test('123'), "]\n"; sub test { my @rv = $_[0] =~ /^([0-9]+)$/; return @rv[ 0 .. $#rv ]; }
    produces (5.6.1)
    michael$ perl t.pl [[] [] [123] [123] michael:$ perl -v This is perl, v5.6.1 built for sun4-solaris Copyright 1987-2001, Larry Wall Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5 source ki +t. Complete documentation for Perl, including FAQ lists, should be found +on this system using `man perl' or `perldoc perl'. If you have access to + the Internet, point your browser at http://www.perl.com/, the Perl Home Pa +ge.
    and (5.005_03)
    michael$ /usr/local/bin/perl t.pl [[] [] [123] [123] michael$ /usr/local/bin/perl -v This is perl, version 5.005_03 built for sun4-solaris Copyright 1987-1999, Larry Wall Perl may be copied only under the terms of either the Artistic License + or the GNU General Public License, which may be found in the Perl 5.0 source +kit. Complete documentation for Perl, including FAQ lists, should be found +on this system using `man perl' or `perldoc perl'. If you have access to + the Internet, point your browser at http://www.perl.com/, the Perl Home Pa +ge.

    There were no warnings, I get them when running under 5.10.0

      Do you really get [[] in the first print statement? If so, any idea how (output of first 2 print's is [] with perl 5.8.7 on CentOS 5)?

        Really, I too was surprised. Changing sub test to

        sub test { my @rv = $_[0] =~ /^([0-9]+)$/; print "<<@rv>>\n"; return @rv[ 0 .. $#rv ]; }
        produced
        <<>> [[] <<>> [] <<123>> [123] <<123>> [123]
        changing sub test to
        sub test { my @rv = $_[0] =~ /^([0-9]+)$/; print "<<@rv>>\n"; return wantarray? @rv: $rv[-1]; }
        produces
        michael$ perl t.pl <<>> Use of uninitialized value in print at t.pl line 5. [] <<>> [] <<123>> [123] <<123>> [123]
        Note the warning. (And I suspect there are some monks who are now saying I told you so;-)

Re: regexp list return 5.6 vs 5.8
by Sixtease (Friar) on Jan 24, 2008 at 09:42 UTC
    Is the behavior of list slice subroutine return an intended feature

    A slice is not an array. What you return is what you get. I don't see any magic there. You're definitely right about not golfing production code. This is from a test script however - I allow myself a little more relaxed way of coding in those. :-)

    I think I understand the thing now - it's an XY case if I'm correct. That subroutine's return value has been assigned to a scalar and that has been pushed to an array. Maybe this code works differently on 5.6?

    sub is_digits { my @rv = $_[0] =~ /^([0-9]+)$/; return @rv[ 0 .. $#rv ]; } my $nothing = is_digits('abc'); my @arr1 = ($nothing); my @arr2 = ('some', 'thing'); push @arr2, @arr1; print "There are ", scalar(@arr2), " elements in \@arr2\n"; print '$arr2[2] is ', defined($arr2[2])?'':'un', "defined\n";

    What I expect and get on 5.8.8 is:

    There are 3 elements in @arr2 $arr2[2] is undefined

    Update: Modified the code to more closely follow the original.

    use strict; use warnings; print "Just Another Perl Hacker\n";

      On Solaris 5.6.1 I got

      michaela@drvdb2:michaela$ perl s.pl There are 3 elements in @arr2 $arr2[2] is undefined

      A slice is not an array. What you return is what you get.

      Indeed, but that distinction isn't defined for subroutine return. From perlsub:

      The Perl model for function call and return values is simple: all functions are passed as parameters one single flat list of scalars, and all functions likewise return to their caller one single flat list of scalars. Any arrays or hashes in these call and return lists will collapse, losing their identities--but you may always use pass-by-reference instead to avoid this. Both call and return lists may contain as many or as few scalar elements as you'd like. (Often a function without an explicit return statement is called a subroutine, but there's really no difference from Perl's perspective.)

      ...

      A "return" statement may be used to exit a subroutine, optionally specifying the returned value, which will be evaluated in the appropriate context (list, scalar, or void) depending on the context of the subroutine call. If you specify no return value, the subroutine returns an empty list in list context, the undefined value in scalar context, or nothing in void context. If you return one or more aggregates (arrays and hashes), these will be flattened together into one large indistinguishable list.

      Subroutines return a list of scalars - that's it. Nothing is said about how flattening of aggregates is done, nor is any distinction made between arrays and plain lists.

      I'd not see that as a language feature (muss less a desired one), but as a dark corner which should be inspected for sanity. All of the following snippets should behave the same way:

      perl -le 'sub x {@x = qw(a b c); @x }; $r=x; print $r' 3 perl -le 'sub x {@x = qw(a b c); ()=@x }; $r=x; print $r' 3 perl -le 'sub x {@x = qw(a b c); @x[0..$#x] }; $r=x; print $r' c perl -le 'sub x {@x = qw(a b c); ()=@x[0..$#x] }; $r=x; print $r' 3

      Any formal explanation why the third and fourth of these (should) yield different output?

      Last but not least - allowing a more relaxed coding style in test scripts: IMHO that is the place where most robust code is required: flawed tests are useless.

      --shmem

      _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                    /\_¯/(q    /
      ----------------------------  \__(m.====·.(_("always off the crowd"))."·
      ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}

        Very good points. I'll try to be more disciplined with test.

        What about this one:

        perl -le 'sub x { return qw(a b c);}; $r=x; print $r' c

        This returns the plain list and that's what I want to mimic having it in an array. Arrays and list assignments have special meanings in scalar context. Slices do not. I don't know excactly where but it is documented. (I mean, it's documented that arrays and list assignments have the special scalar-context behavior, not that slices don't :-))

        Update: From perldata:

        If you evaluate an array in scalar context, it returns the length of the array. (Note that this is not true of lists, which return the last value, like the C comma operator
        List assignment in scalar context returns the number of elements produced by the expression on the right side of the assignment:

        And, as you quoted from perlsub:

        A "return" statement may be used to exit a subroutine, optionally specifying the returned value, which will be evaluated in the appropriate context
        use strict; use warnings; print "Just Another Perl Hacker\n";

        Something that just crossed my mind and turned out to behave like I didn't expect:

        perl -le 'sub x { my @x = qw(a b c); my @y = qw(A B C D); return (@x, +@y)} my $r = x(); print $r' 4

        Must say it made me LOL :-)

        use strict; use warnings; print "Just Another Perl Hacker\n";
        You're right about the behavior of slices, but it has nothing to do with subroutine returns. See for example:
        perl -e"my @a=qw/a b c/; print scalar @a[0..2]" c
        The issue is that an array (or hash, or list) slice in scalar context returns the last element of the slice. I feel like this should be documented in perldata but a quick read of it just now didn't reveal any such indication.

        Subroutines return a list of scalars, but when the sub is in scalar context, that list will always have exactly one scalar in it.

        "A 'return' statement [...] will be evaluated in the appropriate context" means the op which returns the arguments to return is executed in the same context as the sub. It's the *statement* that is evaluated in the appropriate context, not the values returned.

        When evaluated in scalar context, arrays return their length. (Case 1)
        When evaluated in scalar context, lists return their last element. (Case 3)
        When evaluated in scalar context, list assignments return the number of elements assigned. (Case 2 and 4)

Re: regexp list return 5.6 vs 5.8
by Sixtease (Friar) on Jan 24, 2008 at 12:29 UTC

    I'm still very curious about the weird [[] and {{} that hipowls was getting. Maybe it's worth a thread.

    It would seem that I'm unable to locate the cause of the error people seem to experience testing my package on v5.6. If any of you are interested, it's Data::FeatureFactory on cpan, maybe you can have a look (make test). I'll try to find a 5.6 installation somewhere and play with it.

    use strict; use warnings; print "Just Another Perl Hacker\n";

      For what's worth I can confirm that it happens with ActiveState's 5.6.1, build 638 on Linux. My tool chain is too modern to build it from source;-(

      I get the same behaviour with 5.8.4, BTW. It seems that when the range 0..-1 is used to select the elements of the slice ($#rv is -1 when @rv is empty), the 'final element' of the slice (incorrectly) evaluates to the previous/last element on the Perl stack (or some such). With Perl versions up to at least 5.8.4, that is — but no longer with 5.8.8 and 5.10.0 (I currently don't have access to versions 5.8.5 - 5.8.7, so I can't tell when it got fixed).

      (Ranges like [99..98] behave the same way as [0..-1], so what seems to matter is just that the second value in the range is smaller than the first...)

      my @x = ('A','B'); print "-------------------\n"; print "[foo", scalar(@x[0..2]), "]\n"; print "-------------------\n"; print "[foo", scalar(@x[0..1]), "]\n"; print "-------------------\n"; print "[foo", scalar(@x[0..0]), "]\n"; print "-------------------\n"; print "[foo", scalar(@x[0..-1]), "]\n"; print "-------------------\n";

      With 5.8.4 (and earlier), this prints

      ------------------- Use of uninitialized value in print at ./663945.pl line 9. [foo] ------------------- [fooB] ------------------- [fooA] ------------------- [foo[foo] -------------------

      and with 5.8.8 or 5.10.0

      ------------------- Use of uninitialized value in print at ./663945.pl line 9. [foo] ------------------- [fooB] ------------------- [fooA] ------------------- Use of uninitialized value in print at ./663945.pl line 15. [foo] -------------------

      As has already been pointed out elsewhere in the thread, this is mostly expected behaviour, because (from perldoc -f scalar)

      scalar EXPR
      (...)
      Because "scalar" is unary operator, if you accidentally use for EXPR a parenthesized list, this behaves as a scalar comma expression, evaluating all but the last element in void context and returning the final element evaluated in scalar context.

      ...except for the "[foo[foo]", of course.

      I don't have a real explanation (probably simply a bug)...  just a couple of related observations with respect to using [0..-1] with slices. When you use a literal list instead of a named array, there's still some curious behaviour in recent releases of Perl:

      print "-------------------\n"; print "[foo", scalar(('A','B')[0..2]), "]\n"; print "-------------------\n"; print "[foo", scalar(('A','B')[0..1]), "]\n"; print "-------------------\n"; print "[foo", scalar(('A','B')[0..0]), "]\n"; print "-------------------\n"; print "[foo", scalar(('A','B')[0..-1]), "]\n"; print "-------------------\n";

      prints

      ------------------- Use of uninitialized value in print at ./663945.pl line 26. [foo] ------------------- [fooB] ------------------- [fooA] ------------------- Argument "[foo" isn't numeric in list slice at ./663945.pl line 32. [fooA] -------------------

      The '"[foo" isn't numeric...' seems to suggest that with [0..-1] the value "[foo" is being used to index the element from the list... which is confirmed by this:

      print "-------------------\n"; print 2, scalar(('A','B')[0..-1]), "]\n"; # elem at index 2 (undef) print "-------------------\n"; print 1, scalar(('A','B')[0..-1]), "]\n"; # elem at index 1 ('B') print "-------------------\n"; print 0, scalar(('A','B')[0..-1]), "]\n"; # elem at index 0 ('A') print "-------------------\n"; print -1, scalar(('A','B')[0..-1]), "]\n"; # elem at index -1 (last e +lem 'B') print "-------------------\n"; print -2, scalar(('A','B')[0..-1]), "]\n"; # elem at index -2 ('A') print "-------------------\n"; print -3, scalar(('A','B')[0..-1]), "]\n"; # elem at index -3 (undef) print "-------------------\n";

      which prints

      ------------------- Use of uninitialized value in print at ./663945.pl line 37. 2] ------------------- 1B] ------------------- 0A] ------------------- -1B] ------------------- -2A] ------------------- Use of uninitialized value in print at ./663945.pl line 47. -3] -------------------

      Looks like this "indirect indexing" feature could be useful for obfus ;)

        Oh dear God, this is crazy. :-)) Thanks an awful lot for the explanation! This also explains why my test has been failing.

        use strict; use warnings; print "Just Another Perl Hacker\n";
Re: regexp list return 5.6 vs 5.8
by Sixtease (Friar) on Jan 24, 2008 at 17:51 UTC

    OK, one more question: How do you guys recommend that I flatten lists? I mean... say I have some values in an array, or in two arrays and I want to turn it into one flat list that will not have any array-specific or similar behavior. I can want to pass it from a subroutine, from a do statement or from a map block or eval or whatever...

    Update: How about map $_, @stuff, @more_stuff?

    Update 2: Nope. As documented: "In scalar context, returns the total number of elements so generated."

    perl -le 'print scalar map $_, qw(a b c)' 3
    use strict; use warnings; print "Just Another Perl Hacker\n";
      What do you mean exactly? What kind of "array-specific" behavior are you trying to avoid? If you want to take data you are creating as a list but apply it in scalar context, you need to make some decision for yourself about what data you want and code accordingly. For example, map operates on a list and returns a list. If you want to use that in scalar context, and want something other than the length of the list as the value, you need to decide what you want. If you always want the last value, one way to do that is
      my $scalar = (map { some code } some list)[-1];
      OK, one more question: How do you guys recommend that I flatten lists? I mean... say I have some values in an array, or in two arrays and I want to turn it into one flat list that will not have any array-specific or similar behavior.

      You flatten lists like this:

      my @newarray = @oldarray; my @newarray = (@oldarray1, @oldarray2); my @newarray = (@oldarray, 2, 3, qw( foo bar baz ) );

      But your question doesn't make a lot of sense. You seem to be asking "How do I make an array that isn't an array", and the answer to that is "You can't".

      If you're trying to ask "How do I return different values from a function depending on the calling context", then look into wantarray as shmem suggests.

        I think the OP is essentially asking (Sixtease please correct me if I'm wrong) how you would make something like the following snippet print "e", and not "2"

        my @x = qw(a b c); my @y = qw(d e); print scalar (@x, @y); # prints "2" (number of elems in @y)

        treating the combined arrays as if they had been written like

        print scalar qw(a b c d e); # prints "e" (last elem in list)

        Kind of like this

        print scalar ((@x, @y)[0..@x+@y-1]); # prints "e" print scalar sub {@_[0..$#_]}->(@x, @y); # prints "e"

        but less ugly, and without having to take special care of the subtle problem you run into with older versions of Perl when the arrays are empty, and the selecting range for the slice becomes [0..-1]  (what this thread is about, essentially).

        Irrespective of whether you'd actually need to do something like this in real-life programming, it's still a valid question in and of itself, IMO.

Re: regexp list return 5.6 vs 5.8
by ikegami (Pope) on Jan 24, 2008 at 18:12 UTC

    In scalar context, I get the same behaviour in 5.6.0, 5.6.1, 5.8.0 and 5.8.8

    >c:\progs\perl560\bin\perl -wle"$s = sub { my @rv = $_[0] =~ /^([0-9]+ +)$/; retur n @rv[0 .. $#rv]; }->('abc'); print $s" Use of uninitialized value in print at -e line 1. >c:\progs\perl561\bin\perl -wle"$s = sub { my @rv = $_[0] =~ /^([0-9]+ +)$/; retur n @rv[0 .. $#rv]; }->('abc'); print $s" Use of uninitialized value in print at -e line 1. >c:\progs\perl580\bin\perl -wle"$s = sub { my @rv = $_[0] =~ /^([0-9]+ +)$/; retur n @rv[0 .. $#rv]; }->('abc'); print $s" Use of uninitialized value in print at -e line 1. >c:\progs\perl588\bin\perl -wle"$s = sub { my @rv = $_[0] =~ /^([0-9]+ +)$/; retur n @rv[0 .. $#rv]; }->('abc'); print $s" Use of uninitialized value in print at -e line 1.

    In list context, I get the same behaviour in 5.6.0, 5.6.1, 5.8.0 and 5.8.8

    >c:\progs\perl560\bin\perl -wle"@a = sub { my @rv = $_[0] =~ /^([0-9]+ +)$/; retur n @rv[0 .. $#rv]; }->('abc'); print @a" >c:\progs\perl561\bin\perl -wle"@a = sub { my @rv = $_[0] =~ /^([0-9]+ +)$/; retur n @rv[0 .. $#rv]; }->('abc'); print @a" >c:\progs\perl580\bin\perl -wle"@a = sub { my @rv = $_[0] =~ /^([0-9]+ +)$/; retur n @rv[0 .. $#rv]; }->('abc'); print @a" >c:\progs\perl588\bin\perl -wle"@a = sub { my @rv = $_[0] =~ /^([0-9]+ +)$/; retur n @rv[0 .. $#rv]; }->('abc'); print @a"

      I think whether you get different behaviour depends on the exact circumstances in which the evaluation happens. I would argue that you don't see a difference, because there's nothing "on the stack" (as I hypothesized in my other reply) prior to the evaluation of the slice in scalar context. If I modify your test slightly, I do get different results depending on Perl version.

      perl -wle'sub {"foo"}->(), $s = sub { my @rv = $_[0] =~ /^([0-9]+)$/; +return @rv[0 .. $#rv]; }->("abc"); print $s'

      5.8.4 prints foo (and no warning), while 5.8.8 and 5.10.0 print Use of uninitialized value $s in print at -e line 1. (and no "foo").

      Note 1: the sub {"foo"}->() is just a way to avoid the "Useless use of a constant in void context", which I would otherwise get when simply writing "foo", ...

      Note 2: Windows users will probably have to swap single and double quotes  (I changed them myself in the first place, because ikegami's original version would have required additional quoting with a typical Unix shell...)

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://663945]
Approved by Corion
Front-paged by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (9)
As of 2014-08-22 09:03 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The best computer themed movie is:











    Results (152 votes), past polls