Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

map and return

by ELISHEVA (Prior)
on Sep 03, 2009 at 10:03 UTC ( [id://793143]=perlquestion: print w/replies, xml ) Need Help??

ELISHEVA has asked for the wisdom of the Perl Monks concerning the following question:

Recently I noticed something odd with map and return. According to perlsub, an initial & in a prototype allows you to define a subroutine that emulates the syntax of Perl built-ins like map and grep. And yet there seem to be subtle differences, like the behavior of return within the block. For instance, in the case of map the block after the subname is scoped as if it were part of the surrounding code rather than the body of an anonymous sub.

The following code, placed inside a subroutine will cause the subroutine to return immediately. If placed at the top level of the script it will cause the Perl compiler to complain "Can't return outside subroutine ..." (Perl version 5.8.8; warnings, strict turned on of course).

my @x = map { return $_; } (1,2,3); print "(@x)\n";

However, when the same block { return $_; } is called via a user defined sub that does essentially the same thing as map there are no compiler complaints nor precipitous returns:

sub foo(&@) { my ($crSub, @aParams) = @_; my @aResult; push(@aResult, $crSub->($_)) foreach @aParams; return @aResult; } #this outputs "(1 2 3)" my @x = foo { return $_; } (1, 2, 3); print "(@x)\n";

perlsub says that the user defined routines parse "almost exactly like". Almost isn't exactly. Why the difference in behavior? And what other subtle differences are there between built-in tokens like map and user defined subroutines with a (&@) prototype? The return behavior of map ate up a good deal of debugging time the other day because I just assumed that the block after map was just like an anonymous subroutine - which it clearly isn't. I'm hoping to save time in the future with a heads up on other differences hidden behind the phrase "almost exactly".

Many thanks in advance, beth

Replies are listed 'Best First'.
Re: map and return
by ambrus (Abbot) on Sep 03, 2009 at 10:51 UTC

    Many of the builtin functions have special syntax that can not be described with prototypes. Actually even the syntax of prototyped functions is quite complicated, so you could even say the syntax of any one builtin is quite regular, the only problem is that there are more different kinds of builtins than there are prototypes.

    As an example, let's look at how syntax of calling map is different from a sub you declare with (&@) prototype.

    You can call map with the first argument being a bare expression followed by a comma or a braced block without a comma, and you can do either even if you put this first argument inside the function call parenthesis. For example, these four are equivalent.

    print map ucfirst, "just another ", "perl hacker\n"; print map { ucfirst } "just another ", "perl hacker\n"; print map(ucfirst, "just another ", "perl hacker\n"); print map({ ucfirst } "just another ", "perl hacker\n");
    If you use function call parenthesis, which must include the first argument, like in the last two lines, the rule that the function call ends at the closing parenthesis applies, so in the following statements "hacker" is not capitalized.
    print map(ucfirst, "just another ", "perl "), "hacker\n"; print map({ ucfirst } "just another ", "perl "), "hacker\n";
    In contrast, if you define a function like this,
    sub mymap (&@) { map { &{$_[0]}() } @_[1..@_-1]; }
    then you cannot call it with a bare expression as its first argument. You can call it with a bare block with or without a comma, provided you omit the parenthesis, so the following two are valid, but the second would not work with map.
    print mymap { ucfirst } "just another ", "perl hacker\n"; print mymap { ucfirst }, "just another ", "perl hacker\n";
    You can not add function call parenthesis if you use bare blocks. If you use an immediate sub block or certain restricted classes of expressions as the first argument, then you may add parenthesis, so the following work. Most expressions just don't work as first argument though.
    print mymap sub { ucfirst }, "just another ", "perl hacker\n"; print mymap(sub { ucfirst }, "just another ", "perl hacker\n"); sub ucf { ucfirst }; print mymap \&ucf, "just another ", "perl hacker\ +n"; sub ucf { ucfirst }; print mymap(\&ucf, "just another ", "perl hacker\ +n"); $ucf = sub { ucfirst }; print mymap \&$ucf, "just another ", "perl hac +ker\n"; $ucf = sub { ucfirst }; print mymap(\&$ucf, "just another ", "perl hac +ker\n");
Re: map and return
by merlyn (Sage) on Sep 03, 2009 at 10:14 UTC
    The behavior is consistent, even though the syntax is misleading. In both cases, precisely one level of subroutine call is being popped.

    A similar situation exists when you compare a do { ... } while (...) loop with a while (...) { ... } loop: last/next/redo ignore the former (and act on an outer block), while they respect the latter as the innermost enclosing loop block.

    Confusing to a beginner, but makes sense once you play with it for a bit.

    -- Randal L. Schwartz, Perl hacker

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

      If map is meant by design to function as a flow of control token like while and foreach, then I would expect it to be documented as such in perlsyn, which it is not.

      Best, beth

        What is the prototype for map? It is undef because its arguments cannot be expressed by a prototype because the builtin does not really behave like a Perl function.
Re: map and return
by ikegami (Patriarch) on Sep 03, 2009 at 13:58 UTC

    Remember that

    sub foo(&); foo { ... };
    is syntax sugar for
    foo(sub { ... });

    On the other hand, the block for map is no more a sub than the block for for.

    use strict; use warnings; sub my_map(&@) { my $cb = shift; my @rv; push @rv, $cb->($_) for @_; return @rv; } sub map_tester { print("pre\n"); map { print("in\n"); return 1 } 1; print("post\n"); } sub my_map_tester { print("pre\n"); my_map { print("in\n"); return 1 } 1; print("post\n"); } map_tester(); print("\n"); my_map_tester();
    pre in pre in post

    Why the difference in behavior?

    The only existing means of calling a detached (e.g. referenced) opcode tree is a sub.

    That's true even at a very low level. That's why map isn't implemented as a function that takes the block as an argument. map is truly a flow control structure.

    For example, @b = map { foo() } @a compiles into something like the following:

    my @anon_list; for (@a) { push @anon_list, foo(); } @b = @anon_list;

    There's obviously no way to compile a call my_map into a loop, so differences are to be expected.

      First many thanks for giving the internal perspective (I was hoping you would take the time to do this!). I am not terribly familiar with the Perl source code, but it seems that the op code names in opcode.h are consistent with your point about map being a shorthand for a looping op-tree. In EXTCONST char* const PL_op_name[], not only does "map" have an op code name, but I also see two others: "mapstart" and "mapwhile". Are these the opcodes for the loop you are talking about? (sometimes header files can be deceiving if you don't know the code base well)

      grep acts like map with regard to returns. It also seems to treat its block like a loop and, not surprisingly, it too has three op code names: "grep", "grepstart", and "grepwhile".

      On the other hand, I'm thinking that sort may be implemented like something closer to function. Unlike "map" or "grep" it has only the one op-code "sort". Also, as mentioned earlier on this thread, it treats returns as if the block was an eval {} or anonymous subroutine. What is your take given your greater experience with internals?

      Also, is there any guideline or rule of thumb that can be used to determine how routines listed in index-functions are going to treat a block? It seems like there ought to be something other than testing code samples, knowing internals, or word-of-mouth from other Perl programmers.

      Best, beth

        My knowledge of internals is mostly limited to what B::Concise and Devel::Peek output. Fortunately, this falls within that realm.

        map's block is inlined:

        $ perl -MO=Concise,-exec -e'@b = map { foo() } @a' 1 <0> enter 2 <;> nextstate(main 2 -e:1) v 3 <0> pushmark s 4 <0> pushmark s 5 <#> gv[*a] s 6 <1> rv2av[t6] lKM/1 7 <@> mapstart lK*/2 8 <|> mapwhile(other->9)[t7] lK/1 9 <0> pushmark s a <#> gv[*foo] s/EARLYCV b <1> entersub[t4] lKS/TARG,1 - <@> scope lK goto 8 c <0> pushmark s d <#> gv[*b] s e <1> rv2av[t2] lKRM*/1 f <2> aassign[t8] vKS/COMMON g <@> leave[1 ref] vKP/REFC -e syntax OK

        Same for grep:

        $ perl -MO=Concise,-exec -e'@b = grep { foo() } @a' 1 <0> enter 2 <;> nextstate(main 2 -e:1) v 3 <0> pushmark s 4 <0> pushmark s 5 <#> gv[*a] s 6 <1> rv2av[t6] lKM/1 7 <@> grepstart lK*/2 8 <|> grepwhile(other->9)[t7] lK/1 9 <0> pushmark s a <#> gv[*foo] s/EARLYCV b <1> entersub[t4] sKS/TARG,1 - <@> scope sK goto 8 c <0> pushmark s d <#> gv[*b] s e <1> rv2av[t2] lKRM*/1 f <2> aassign[t8] vKS/COMMON g <@> leave[1 ref] vKP/REFC -e syntax OK

        For comparison, where's what a foreach loop looks like:

        $ perl -MO=Concise,-exec -e'for (@a) { foo() }' 1 <0> enter 2 <;> nextstate(main 2 -e:1) v 3 <0> pushmark sM 4 <#> gv[*a] s 5 <1> rv2av[t2] sKRM/1 6 <#> gv[*_] s 7 <{> enteriter(next->c last->f redo->8) lKS d <0> iter s e <|> and(other->8) vK/1 8 <;> nextstate(main 1 -e:1) v 9 <0> pushmark s a <#> gv[*foo] s/EARLYCV b <1> entersub[t4] vKS/TARG,1 c <0> unstack v goto d f <2> leaveloop vK/2 g <@> leave[1 ref] vKP/REFC -e syntax OK

        Sort can call a sub, so it makes a sub from the block:

        $ perl -MO=Concise,-exec -e'@b = sort foo @a' 1 <0> enter 2 <;> nextstate(main 1 -e:1) v 3 <0> pushmark s 4 <0> pushmark s 5 <$> const[PV "foo"] s/BARE 6 <#> gv[*a] s 7 <1> rv2av[t4] lK/1 8 <@> sort lKS 9 <0> pushmark s a <#> gv[*b] s b <1> rv2av[t2] lKRM*/1 c <2> aassign[t5] vKS d <@> leave[1 ref] vKP/REFC -e syntax OK
        $ perl -MO=Concise,-exec -e'@b = sort { foo() } @a' 1 <0> enter 2 <;> nextstate(main 2 -e:1) v 3 <0> pushmark s 4 <0> pushmark s 5 <#> gv[*a] s 6 <1> rv2av[t6] lK/1 7 <@> sort lKS* --> I guess the * means the sub is 8 <0> pushmark s attached to the op rather than 9 <#> gv[*b] s found on the stack. a <1> rv2av[t2] lKRM*/1 b <2> aassign[t7] vKS c <@> leave[1 ref] vKP/REFC -e syntax OK

        Finally, "&" prototype in action:

        $ perl -MO=Concise,-exec -e'sub faker(&); faker { foo() }' 1 <0> enter 2 <;> nextstate(main 2 -e:1) v 3 <0> pushmark s 4 <0> pushmark sRM 5 <$> anoncode[CV ] lRM 6 <1> refgen KM/1 7 <#> gv[*faker] s 8 <1> entersub[t3] vKS/TARG,1 9 <@> leave[1 ref] vKP/REFC -e syntax OK
        $ perl -MO=Concise,-exec -e'sub faker(&); faker sub { foo() }' 1 <0> enter 2 <;> nextstate(main 2 -e:1) v 3 <0> pushmark s 4 <0> pushmark sRM 5 <$> anoncode[CV ] lRM 6 <1> refgen KM/1 7 <#> gv[*faker] s 8 <1> entersub[t3] vKS/TARG,1 9 <@> leave[1 ref] vKP/REFC -e syntax OK

        Are these the opcodes for the loop you are talking about?

        No, I meant the body of the curlies. In my examples, that would be the call to foo(). How can you pass "a call to foo()" to a sub? You can't. Perl puts it in an anon sub and passes a reference to that sub.

        is there any guideline or rule of thumb that can be used to determine how routines listed in index-functions are going to treat a block?

        Whenever possible, subs are avoided. They are expensive, especially when the alternative is just executing the next instruction.

        Think of it this way: If the body of the block is constant, it will be inlined. If it's not, it will be become a sub.

        • sort takes a sub for argument, so it can't be inlined. I guess it could inline sort BLOCK LIST and not sort SUBNAME LIST, but perl uses the same behaviour for both (documented).
        • eval's block acts like a sub (documented). I guess it's easier to implement exceptions that way.
        • sub creates a sub from the block. duh.
        • Every other block is inlined.

        The body of prototyped function isn't constant, so the variable part is placed in a sub and passed as a code ref.

Re: map and return
by LanX (Saint) on Sep 03, 2009 at 16:12 UTC
    And what other subtle differences are there between built-in tokens like map and user defined subroutines with a (&@) prototype?

    We recently had a similar discussion about the restrictions of prototypes in simulating the behavior of syntax commands and about incomplete documentation.

    e.g. here

    Re^2: coderefs and (&) prototypes

    Hope it helps...

    Cheers Rolf

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://793143]
Approved by lidden
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (3)
As of 2024-04-24 05:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found