map and return

ELISHEVA has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: map and return by ambrus (Abbot) on Sep 03, 2009 at 10:51 UTC
Many of the builtin functions have special syntax that can not be described with prototypes. Actually even the syntax of prototyped functions is quite complicated, so you could even say the syntax of any one builtin is quite regular, the only problem is that there are more different kinds of builtins than there are prototypes. As an example, let's look at how syntax of calling map is different from a sub you declare with (&@) prototype. You can call map with the first argument being a bare expression followed by a comma or a braced block without a comma, and you can do either even if you put this first argument inside the function call parenthesis. For example, these four are equivalent. `print map ucfirst, "just another ", "perl hacker\n"; print map { ucfirst } "just another ", "perl hacker\n"; print map(ucfirst, "just another ", "perl hacker\n"); print map({ ucfirst } "just another ", "perl hacker\n");` [download] If you use function call parenthesis, which must include the first argument, like in the last two lines, the rule that the function call ends at the closing parenthesis applies, so in the following statements "hacker" is not capitalized. `print map(ucfirst, "just another ", "perl "), "hacker\n"; print map({ ucfirst } "just another ", "perl "), "hacker\n";` [download] In contrast, if you define a function like this, `sub mymap (&@) { map { &{$_[0]}() } @_[1..@_-1]; }` [download] then you cannot call it with a bare expression as its first argument. You can call it with a bare block with or without a comma, provided you omit the parenthesis, so the following two are valid, but the second would not work with map. `print mymap { ucfirst } "just another ", "perl hacker\n"; print mymap { ucfirst }, "just another ", "perl hacker\n";` [download] You can not add function call parenthesis if you use bare blocks. If you use an immediate sub block or certain restricted classes of expressions as the first argument, then you may add parenthesis, so the following work. Most expressions just don't work as first argument though. `print mymap sub { ucfirst }, "just another ", "perl hacker\n"; print mymap(sub { ucfirst }, "just another ", "perl hacker\n"); sub ucf { ucfirst }; print mymap \&ucf, "just another ", "perl hacker\ +n"; sub ucf { ucfirst }; print mymap(\&ucf, "just another ", "perl hacker\ +n"); $ucf = sub { ucfirst }; print mymap \&$ucf, "just another ", "perl hac +ker\n"; $ucf = sub { ucfirst }; print mymap(\&$ucf, "just another ", "perl hac +ker\n");` [download]	[reply] [d/l] [select]
Re: map and return by merlyn (Sage) on Sep 03, 2009 at 10:14 UTC
The behavior is consistent, even though the syntax is misleading. In both cases, precisely one level of subroutine call is being popped. A similar situation exists when you compare a `do { ... } while (...)` loop with a `while (...) { ... }` loop: last/next/redo ignore the former (and act on an outer block), while they respect the latter as the innermost enclosing loop block. Confusing to a beginner, but makes sense once you play with it for a bit. -- Randal L. Schwartz, Perl hacker The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.	[reply] [d/l] [select]
Re^2: map and return by ELISHEVA (Prior) on Sep 03, 2009 at 10:40 UTC
If map is meant by design to function as a flow of control token like while and foreach, then I would expect it to be documented as such in perlsyn, which it is not. Best, beth	[reply]
Re^3: map and return by Anonymous Monk on Sep 03, 2009 at 11:22 UTC
What is the prototype for map? It is undef because its arguments cannot be expressed by a prototype because the builtin does not really behave like a Perl function.	[reply]
Re^4: map and return by ELISHEVA (Prior) on Sep 03, 2009 at 12:08 UTC
Re^5: map and return by Anonymous Monk on Sep 03, 2009 at 12:27 UTC
Some notes below your chosen depth have not been shown here
Re: map and return by ikegami (Patriarch) on Sep 03, 2009 at 13:58 UTC
Remember that `sub foo(&); foo { ... };` [download] is syntax sugar for `foo(sub { ... });` [download] On the other hand, the block for `map` is no more a sub than the block for `for`. `use strict; use warnings; sub my_map(&@) { my $cb = shift; my @rv; push @rv, $cb->($_) for @_; return @rv; } sub map_tester { print("pre\n"); map { print("in\n"); return 1 } 1; print("post\n"); } sub my_map_tester { print("pre\n"); my_map { print("in\n"); return 1 } 1; print("post\n"); } map_tester(); print("\n"); my_map_tester();` [download] `pre in pre in post` [download] Why the difference in behavior? The only existing means of calling a detached (e.g. referenced) opcode tree is a sub. That's true even at a very low level. That's why `map` isn't implemented as a function that takes the block as an argument. `map` is truly a flow control structure. For example, `@b = map { foo() } @a` compiles into something like the following: `my @anon_list; for (@a) { push @anon_list, foo(); } @b = @anon_list;` [download] There's obviously no way to compile a call `my_map` into a loop, so differences are to be expected.	[reply] [d/l] [select]
Re^2: map and return by ELISHEVA (Prior) on Sep 03, 2009 at 14:50 UTC
First many thanks for giving the internal perspective (I was hoping you would take the time to do this!). I am not terribly familiar with the Perl source code, but it seems that the op code names in opcode.h are consistent with your point about map being a shorthand for a looping op-tree. In `EXTCONST char* const PL_op_name[]`, not only does "map" have an op code name, but I also see two others: "mapstart" and "mapwhile". Are these the opcodes for the loop you are talking about? (sometimes header files can be deceiving if you don't know the code base well) grep acts like map with regard to returns. It also seems to treat its block like a loop and, not surprisingly, it too has three op code names: "grep", "grepstart", and "grepwhile". On the other hand, I'm thinking that `sort` may be implemented like something closer to function. Unlike "map" or "grep" it has only the one op-code "sort". Also, as mentioned earlier on this thread, it treats returns as if the block was an eval {} or anonymous subroutine. What is your take given your greater experience with internals? Also, is there any guideline or rule of thumb that can be used to determine how routines listed in index-functions are going to treat a block? It seems like there ought to be something other than testing code samples, knowing internals, or word-of-mouth from other Perl programmers. Best, beth	[reply] [d/l] [select]
Re^3: map and return by ikegami (Patriarch) on Sep 03, 2009 at 17:09 UTC
My knowledge of internals is mostly limited to what B::Concise and Devel::Peek output. Fortunately, this falls within that realm. `map`'s block is inlined: `$ perl -MO=Concise,-exec -e'@b = map { foo() } @a' 1 <0> enter 2 <;> nextstate(main 2 -e:1) v 3 <0> pushmark s 4 <0> pushmark s 5 <#> gv[a] s 6 <1> rv2av[t6] lKM/1 7 <@> mapstart lK/2 8 <\|> mapwhile(other->9)[t7] lK/1 9 <0> pushmark s a <#> gv[foo] s/EARLYCV b <1> entersub[t4] lKS/TARG,1 - <@> scope lK goto 8 c <0> pushmark s d <#> gv[b] s e <1> rv2av[t2] lKRM/1 f <2> aassign[t8] vKS/COMMON g <@> leave[1 ref] vKP/REFC -e syntax OK` [download] Same for `grep`: `$ perl -MO=Concise,-exec -e'@b = grep { foo() } @a' 1 <0> enter 2 <;> nextstate(main 2 -e:1) v 3 <0> pushmark s 4 <0> pushmark s 5 <#> gv[a] s 6 <1> rv2av[t6] lKM/1 7 <@> grepstart lK/2 8 <\|> grepwhile(other->9)[t7] lK/1 9 <0> pushmark s a <#> gv[foo] s/EARLYCV b <1> entersub[t4] sKS/TARG,1 - <@> scope sK goto 8 c <0> pushmark s d <#> gv[b] s e <1> rv2av[t2] lKRM/1 f <2> aassign[t8] vKS/COMMON g <@> leave[1 ref] vKP/REFC -e syntax OK` [download] For comparison, where's what a foreach loop looks like: `$ perl -MO=Concise,-exec -e'for (@a) { foo() }' 1 <0> enter 2 <;> nextstate(main 2 -e:1) v 3 <0> pushmark sM 4 <#> gv[a] s 5 <1> rv2av[t2] sKRM/1 6 <#> gv[_] s 7 <{> enteriter(next->c last->f redo->8) lKS d <0> iter s e <\|> and(other->8) vK/1 8 <;> nextstate(main 1 -e:1) v 9 <0> pushmark s a <#> gv[foo] s/EARLYCV b <1> entersub[t4] vKS/TARG,1 c <0> unstack v goto d f <2> leaveloop vK/2 g <@> leave[1 ref] vKP/REFC -e syntax OK` [download] Sort can call a sub, so it makes a sub from the block: `$ perl -MO=Concise,-exec -e'@b = sort foo @a' 1 <0> enter 2 <;> nextstate(main 1 -e:1) v 3 <0> pushmark s 4 <0> pushmark s 5 <$> const[PV "foo"] s/BARE 6 <#> gv[a] s 7 <1> rv2av[t4] lK/1 8 <@> sort lKS 9 <0> pushmark s a <#> gv[b] s b <1> rv2av[t2] lKRM/1 c <2> aassign[t5] vKS d <@> leave[1 ref] vKP/REFC -e syntax OK` [download] `$ perl -MO=Concise,-exec -e'@b = sort { foo() } @a' 1 <0> enter 2 <;> nextstate(main 2 -e:1) v 3 <0> pushmark s 4 <0> pushmark s 5 <#> gv[a] s 6 <1> rv2av[t6] lK/1 7 <@> sort lKS --> I guess the * means the sub is 8 <0> pushmark s attached to the op rather than 9 <#> gv[b] s found on the stack. a <1> rv2av[t2] lKRM/1 b <2> aassign[t7] vKS c <@> leave[1 ref] vKP/REFC -e syntax OK` [download] Finally, "&" prototype in action: `$ perl -MO=Concise,-exec -e'sub faker(&); faker { foo() }' 1 <0> enter 2 <;> nextstate(main 2 -e:1) v 3 <0> pushmark s 4 <0> pushmark sRM 5 <$> anoncode[CV ] lRM 6 <1> refgen KM/1 7 <#> gv[faker] s 8 <1> entersub[t3] vKS/TARG,1 9 <@> leave[1 ref] vKP/REFC -e syntax OK` [download] `$ perl -MO=Concise,-exec -e'sub faker(&); faker sub { foo() }' 1 <0> enter 2 <;> nextstate(main 2 -e:1) v 3 <0> pushmark s 4 <0> pushmark sRM 5 <$> anoncode[CV ] lRM 6 <1> refgen KM/1 7 <#> gv[faker] s 8 <1> entersub[t3] vKS/TARG,1 9 <@> leave[1 ref] vKP/REFC -e syntax OK` [download] Are these the opcodes for the loop you are talking about? No, I meant the body of the curlies. In my examples, that would be the call to `foo()`. How can you pass "a call to `foo()`" to a sub? You can't. Perl puts it in an anon sub and passes a reference to that sub. is there any guideline or rule of thumb that can be used to determine how routines listed in index-functions are going to treat a block? Whenever possible, subs are avoided. They are expensive, especially when the alternative is just executing the next instruction. Think of it this way: If the body of the block is constant, it will be inlined. If it's not, it will be become a sub. `sort` takes a sub for argument, so it can't be inlined. I guess it could inline `sort BLOCK LIST` and not `sort SUBNAME LIST`, but perl uses the same behaviour for both (documented). `eval`'s block acts like a sub (documented). I guess it's easier to implement exceptions that way. `sub` creates a sub from the block. duh. Every other block is inlined. The body of prototyped function isn't constant, so the variable part is placed in a sub and passed as a code ref.	[reply] [d/l] [select]
Re: map and return by LanX (Saint) on Sep 03, 2009 at 16:12 UTC
And what other subtle differences are there between built-in tokens like map and user defined subroutines with a (&@) prototype? We recently had a similar discussion about the restrictions of prototypes in simulating the behavior of syntax commands and about incomplete documentation. e.g. here Re^2: coderefs and (&) prototypes Hope it helps... Cheers Rolf	[reply]


"be consistent"
	PerlMonks