Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

My problems to understand the Perl documentation

by bojinlund (Prior)
on Aug 22, 2020 at 20:40 UTC ( #11120998=perlmeditation: print w/replies, xml ) Need Help??

I have had some difficulties to read the Perl documentation. I have tried to understand why. This describes some of my problems to understand the documentation of the function split. I also make some proposals to what could be changed.

Please note, I think I have understood most of what I need to understand, but it has taken a long time.

Definitions of terms use in the description of the Perl language

The definitions in perlglossary are rather general. Examples are:

  • function
  • Mathematically, a mapping of each of a set of input values to a particular output value. In computers, refers to a subroutine or operator that returns a value. It may or may not have input values (called arguments).
  • subroutine
  • A named or otherwise accessible piece of program that can be invoked from elsewhere in the program in order to accomplish some subgoal of the program. A subroutine is often parameterized to accomplish different but related things depending on its input arguments. If the subroutine returns a meaningful value, it is also called a function.
  • Operator
  • A gizmo that transforms some number of input values to some number of output values, often built into a language with a special syntax or symbol. A given operator may have specific expectations about what types of data you give as its arguments (operands) and what type of data you want back from it.
  • Operand
  • An expression that yields a value that an operator operates on. See also precedence.
  • Argument
  • A piece of data supplied to a program, subroutine, function, or method to tell it what it's supposed to do. Also called a "parameter".

Terms I try to use in this post

The Perl's perspective is missing in perlglossary. The definition of the Perl specific meaning of a term when used in the documentation of the Perl language is often missing.

Meaning of terms used here are:

  • Function
  • What you want to be done.
  • Syntax pattern
  • The lines in the beginning of a description of a subroutine or operator. It shows the usages.
  • Subroutine
  • An implementation of a function described in perlsub. The input is called argument. Output is the return value.
  • Operator
  • An implementation of a function described in perlop. The input is called operand.

Are the more implementation mechanisms of functions in Perl?

I am aware of that a subroutine can be used as an operator and the reverse. In this post I have disregarded this.

The description of Perl subroutine

(In the index of "Language reference" the description is called "perlsub - subroutine function". What is a subroutine function?)

I suppose the intention of perlsub is to describe the subroutine mechanism in Perl.

This is one of the key parts to understand the usage of a Perl subroutine: From perlsub:

The Perl model for function call and return values is simple: all functions are passed as parameters one single flat list of scalars, and all functions likewise return to their caller one single flat list of scalars. Any arrays or hashes in these call and return lists will collapse, losing their identities--but you may always use pass-by-reference instead to avoid this. Both call and return lists may contain as many or as few scalar elements as you'd like. (Often a function without an explicit return statement is called a subroutine, but there's really no difference from Perl's perspective.)

(According to the glossary a function refers to a "subroutine or operator that returns a value". The sentence in parenthesis is a bit confusing.)

I have also been misled to think that the text above applies to all the functions described in index-functions.

My proposal is to change the text to:

The Perl model for subroutine call arguments and return values is simple: all subroutines are passed as arguments one single flat list of scalars. All subroutines return a return value, which is one single flat list of scalars. Any arrays or hashes in these call and return lists will collapse, losing their identities--but you may always use pass-by-reference instead to avoid this. Both call and return lists may contain as many or as few scalar elements as you'd like.

And to add this:

The arguments and return values are transferred using the argument stack. The arguments in the script source code is processed (evaluated) to store them in the argument stack. This is done before the subroutine definition is called.

The syntax patterns in the beginning of the descriptions of functions implemented by subroutines, describe the content of the argument stack after the per-processing (evaluation) of the arguments.

(argument stack must be added to perlglossary. Argument stack is described in perlguts

The description of the function split

In the description of the Perl functions it isn't always stated how a function is implemented.

Split is not implemented as a subroutine and does not behave like a subroutine. At least the differences in the processing of the arguments compared to the normal processing of those to a normal Perl subroutine must be described.

The syntax pattern for split describes the source code, not the result after per-processing of the arguments as it is for a subroutine.

If you call split like this @rv = split $_[0], @_[ 1 .. 2 ] you get the return value undef. This is_deeply [ $_[0], @_[ 1 .. 2 ] ], [ qr{(:)}, 'a:b:c', 99 ] is ok.

I had expected a warning or error exception.

All of this is perhaps errors in the implementation if split.

Summary

I understand that it is impossible to do big changes to the documentation of Perl.

Is there something small that can be done?

I believe that one of the first thing to do is to improve is perlglossary. Definitions of meanings from a Perl's perspective should be added.

Perhaps there could be two glossaries. One general and computer science focused and one with the Perl specific definitions used in the documentation of the Perl language.

Replies are listed 'Best First'.
Re: My problems to understand the Perl documentation
by LanX (Cardinal) on Aug 22, 2020 at 21:48 UTC
    Some older languages had/have a clear separation - syntactic and semantic - between subroutine and function!

    Perl doesn't, but explains the terminology for people coming from there.

    perlfunc describes builtin "functions" in CORE::

    Most can be replicated with with pure Perl when using prototypes , so those "flat list arguments" don't apply in these cases.

    Other built-ins can't be replicated because they require special parsing of their arguments.

    The " mathematical function" part is misleading, because Perl's subs can have side-effects, like altering passed or global or closure variables.

    Compare that to languages like Haskell which do their best to be side-effect free like mathematical functions.

    Operators in perlop are indeed built-in functions with special syntax, context and precedence. You can replicate $a+$b*$c with add( $a, mul( $b, $c ) ) and in some docs you'll find function like built-ins called "operators"

    The goal you seem to have is to define Perl in Lisp'ish way, where everything can be derived from a small set of axioms.

    I doubt that's possible, because the creators cared more about DWIM than orthogonality.

    Perl has loads of influences from C, bash, sed, awk and Lisp and tried to combine them in an "organic" way to make people coming from those ends feal like at home.

    Those features were not implemented in a clean canonical mini Perl ...

    EDIT

    ... and there is no language definition like ECMA.

    The best possible outcome you can achieve is probably a language definition for 98% of Perl combined with a long list of exceptions.

    One simple example: built-ins like map and grep can be replicated with

    sub name(&;@) { my ($code, @list) = @_; ... } # then called name {BLOCK} LIST;

    BUT having a return inside the BLOCK will lead to a very different result, hence gone the orthogonality.

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      Callable unit isn’t a bad explanation. Or definition. Should be enough. The rest are is «jesuitisch raffiniert» subtleties.

      Update: Changed wording because probably to harsh.

      «The Crux of the Biscuit is the Apostrophe»

      perl -MCrypt::CBC -E 'say Crypt::CBC->new(-key=>'kgb',-cipher=>"Blowfish")->decrypt_hex($ENV{KARL});'Help

        Thanks for comparing me with Joseph Goebbels, again!

        You must know, unfortunately I'm not old enough to tell ...

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

Re: My problems to understand the Perl documentation
by afoken (Canon) on Aug 22, 2020 at 21:42 UTC
    Is there something small that can be done?

    Yes, of course. If you think something in the documentation can be improved, create a patch. Documentation is in *.pod files, several of the README.* files, and - for legacy reasons - hidden in some other files. (Use grep to search for lines starting with =.) See perlhack, perlgit, perlpolicy. You can propose your changes here at perlmonks, too, for discussion.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
Re: My problems to understand the Perl documentation
by AnomalousMonk (Bishop) on Aug 22, 2020 at 23:54 UTC
    ... two glossaries. One general and computer science ...

    I think that a general, CS-ish glossary or reference source is best provided by existing on-line resources like Wikipedia, to which PerlMonks can easily link. See expression, subroutine, function, etc. Perl should stick to documenting Perlish things. (Only Perl can gloss Perl.)

    (Note that in Wikipedia, the CS definitions of "subroutine" and "function" are synonymous | discussed in the same article (per LanX here).)


    Give a man a fish:  <%-{-{-{-<

      > (Note that in Wikipedia, the CS definitions of "subroutine" and "function" are synonymous.)

      Linking to the same page doesn't mean they are "synonymous".

      WP entries on programming are written by programmers. If their main language is redefining a concept they'll take it as normative.

      Compare the notion of "closure" in PHP for "anonymous subroutine" which is ... "unfortunate". In Python are anonymous subroutines aka "lambdas" restricted to one expression only which is ... "even more unfortunate".

      IOW various WP authors will introduce their own truth.

      Anyway this WP goes into some details to explain why the concept vary between different languages.

      Some programming languages, such as Pascal, Fortran, Ada and many dialects of BASIC, distinguish between functions or function subprograms, which provide an explicit return value to the calling program, and subroutines or procedures, which do not. In those languages, function calls are normally embedded in expressions (e.g., a sqrt function may be called as y = z + sqrt(x)). Procedure calls either behave syntactically as statements (e.g., a print procedure may be called as if x > 0 then print(x) or are explicitly invoked by a statement such as CALL or GOSUB (e.g., call print(x)). Other languages, such as C and Lisp, do not distinguish between functions and subroutines.

      In strictly functional programming languages such as Haskell, subprograms can have no side effects, which means that various internal states of the program will not change. Functions will always return the same result if repeatedly called with the same arguments. Such languages typically only support functions, since subroutines that do not return a value have no use unless they can cause a side effect.

      In programming languages such as C, C++, and C#, subroutines may also simply be called functions, not to be confused with mathematical functions or functional programming, which are different concepts.

      So subroutines and functions are synonymous in Perl, but not in general.

      To be more precise Perl's sub is a meta construct implementing all features from procedures and functions like known in older languages, from this perspective the names are linguistically just "pars pro toto" in Perl.

      In Perl you can use a sub like °

      • a function
      • a procedure
      • a method
      • a closure
      • a lambda aka anonymous sub
      • an operator (via overloading)

      so the wordings become sometimes fuzzy.

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      °) the listed items are not exclusive, there are overlaps.

        > (Note that in Wikipedia, the CS definitions of "subroutine" and "function" are synonymous.) Linking to the same page doesn't mean they are "synonymous".

        No, of course Wikipedia isn't the arbiter of all knowledge and, at times, it's downright wrong but how, exactly, would you characterize the difference between a subroutine and a function? To my mind, there's no difference and methods belong in the same category since they're just functions that supply the object as the first parameter. void functions are still functions and Perl is still written in C.

Re: My problems to understand the Perl documentation [updated]
by jo37 (Friar) on Aug 23, 2020 at 08:40 UTC
    If you call split like this @rv = split $_[0], @_[ 1 .. 2 ] you get the return value undef.

    This is not true, it gives (99). Update 1: Presuming @_ = (qr{:}, 'a:b:c', 99) from the given example. Update 2: A capturing group in the regex does not change the result for this case.

    To understand this behaviour you need to consider function prototypes. The compiler recognises split being called with two arguments, where the second is a list that gets assigned to a single scalar. This results in the last element of the list being assigned to the scalar, just like in

    $x = (1, 2, 4); # $x = 4
    This behaviour is caused by the usage of an array slice. A "pure" array would be evaluated in scalar context giving its length.

    In the following example the sub show_args sees its arguments just like split.

    UPDATE 3: As suspected by bojinlund in Re^2: My problems to understand the Perl documentation and proven by LanX in Re^4: My problems to understand the Perl documentation, the sub show_args does not see its arguments exactly as split.

    #!/usr/bin/perl use strict; use warnings; use Data::Dump 'dd'; use feature 'say'; sub show_args ($_;$) { dd @_; } sub call_show_args { say "single:"; show_args $_[0], $_[1], $_[2]; say "flat:"; show_args @_; say "slice:"; show_args @_[0 .. 2]; say "slice split:"; show_args $_[0], @_[1 .. 2]; say "array split:"; my $p = shift; show_args $p, @_; } $_ = 'default'; call_show_args qr{:}, 'a:b:c', 99; __DATA__ single: (qr/:/, "a:b:c", 99) flat: (3, "default") slice: (99, "default") slice split: (qr/:/, 99) array split: (qr/:/, 2)

    Greetings,
    -jo

    $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
      Thank you jo37 for your work!
      This is not true, it gives (99).

      Sorry, I had an error in my test script. Now I also get 99.

      Your example made me understand that a prototype also defines the context (scalar/list) for the evaluation of the arguments.

      My point is that the function split is not fully described in the Perl documentation

      • Reading index-functions I get the impression that split is a function.
      • In perlsub is stated:
      • The Perl model for function call and return values is simple: all functions are passed as parameters one single flat list of scalars, and all functions likewise return to their caller one single flat list of scalars. ...
      • The function prototype returns the prototype of a function as a string (or undef if the function has no prototype).
        prototype "CORE::split" returns undef (Thanks LanX )

      My conclusion from this is that split is a Perl sub, without prototype.

      I have however realized that:

      • Split's first argument, /PATTERN/, is not evaluated immediately. In some way it is kept for later usage.
      • The rest of the arguments are not treated like a Perl sub without prototype.

      Somewhere in split this differences from a normal Perl sub should be included. As split behaves more like an operator it is perhaps better to call it an operator. It is also implemented as an operator and not as a Perl sub.

        Reading index-functions I get the impression that split is a function.

        That list also includes things that are clearly not a function, like and, __END__, and m.

        The function prototype returns the prototype of a function as a string (or undef if the function has no prototype). prototype "CORE::split" returns undef

        You seem to be ignoring the part of that doc that says:

        If FUNCTION is a string starting with CORE::, the rest is taken as a name for a Perl builtin. If the builtin's arguments cannot be adequately expressed by a prototype (such as system), prototype returns undef, because the builtin does not really behave like a Perl function.
        As split behaves more like an operator it is perhaps better to call it an operator.

        Please read carefully Terms and List Operators (Leftward):

        Actually, there aren't really functions in this sense, just list operators and unary operators behaving as functions because you put parentheses around the arguments.

        Several things in Perl only make sense when you read all of the documentation.

        Split IS a builtin function.

        I told you already that the fact you can't easily replicate it with sub doesn't mean it's not a function.

        Perl is not Lisp where the whole syntax can be expressed with one single s-expression .

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        > Reading index-functions I get the impression that split is a function.

        That's taken from perlfunc and there it clearly states

        Here are Perl's functions (including things that look like functions, like some keywords and named operators ) arranged by category.

        It should better have been named "keyword-index"

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        My conclusion from this is that split is a Perl sub, without prototype.

        Is there any built-in without a prototype? I don't think so.

        Greetings,
        -jo

        $gryYup$d0ylprbpriprrYpkJl2xyl~rzg??P~5lp2hyl0p$
      Sorry for taking so long time to create this post, but I needed some time to read and think. Monks thank you for your answers, they have been helpful.
      The goal you seem to have is to define Perl in Lisp'ish way, where everything can be derived from a small set of axioms.

      No, I just want to understand the documentation, and hopefully help others to do the same.

      This Re: My problems to understand the Perl documentation [updated] help me realize that I had missed Slices in scalar context return the last item of the slice. from perldata

      Split is an operator (op)

      From perlfunc, which is a list of Perl builtin functions. Split is in this list.

      The functions in this section can serve as terms in an expression. They fall into two major categories: list operators and named unary operators.

      From perlinterp : “An op is a fundamental operation that Perl can perform: all the built-in functions and operators are ops”

      From token.c in perl.git: case KEY_split: LOP(OP_SPLIT,XTERM);/

      From pp.c in perl.git:

      This file contains general pp ("push/pop") functions that execute the opcodes that make up a perl program. A typical pp function expects to find its arguments on the stack, and usually pushes its results onto the stack, hence the 'pp' terminology.
      PP(pp_split) { … } implements split.

      The de-parse of sub { split} does not show any subroutine call.

      Ideas to changes to the documentation

      Here follows a number of rough drafts to changes.

      Passing of arguments and return values

      This from perlsub is only one case:

      The Perl model for function call and return values is simple: all functions are passed as parameters one single flat list of scalars, and all functions likewise return to their caller one single flat list of scalars. Any arrays or hashes in these call and return lists will collapse, losing their identities--but you may always use pass-by-reference instead to avoid this. Both call and return lists may contain as many or as few scalar elements as you'd like. (Often a function without an explicit return statement is called a subroutine, but there's really no difference from Perl's perspective.)

      It only applies to Perl subs without Prototype or Signature. (I was mislead that it applied to any function.)

      Proposals to replace the quoted text above with:

      The arguments and return values are transferred using the argument stack.

      The subroutine arguments in the script source code is processed (evaluated), to store them on the argument stack. This is done before the call of the subroutine definition.

      Perl has two ways of processing the arguments before calling a function.

      • One single flat list of scalars
      • The arguments are passed as a single flat list of scalar values. Any arrays or hashes are collapsed and they are losing their identities. But you may always use pass-by-reference instead to avoid this.

        This method is used for subroutines without a prototype. It is also used for any subroutine called with an & and … ?.

      • Argument processed after each other
      • The arguments are separated by the commas in the source code. The arguments are processed one after each other. The processing is done in either scalar or list context.

        This method is used for subroutines with a prototype. The prototype defines which context is used.

        It is also used for built-in functions described in perlfunc.

      Proposals to add this before A return statement may :

      The return value is a flat list of scalar values, if the subroutine is called in list context and single one in scalar context. To avoid flattening of arrays or hashes you must use pass-by-reference.

      Documentation of split

      Proposals to add this in the beginning of split

      Split is a built-in function (implemented in the Perl interpreter and not as a Perl sub).

      The arguments are processed consecutively and in scalar context. The result of the processing of /PATTERN/ is a compiled regular expression. It is later used to match the separators. If LIMIT is one it is not used.

        I would reword "the call of the subroutine definition" - the definition isn't called, the definition is parsed and compiled to ops, and then this compiled thing is called. But as a non-native speaker of English, I don't dare to guess how to formulate that to make it easy to understand for newcomers.

        map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]
        Icky. Examples beat words
A reply falls below the community's threshold of quality. You may see it by logging in.
A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://11120998]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (5)
As of 2020-10-23 09:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    My favourite web site is:












    Results (236 votes). Check out past polls.

    Notices?