http://www.perlmonks.org?node_id=444124

Plotinus has asked for the wisdom of the Perl Monks concerning the following question:

Good Morning Venerable Monks,

This is more of an rambling inquiry into understanding some code I've come across, hope you don't mind.

I'm trying to move from Sys Admin with some scripting ability to actual coder with sys admin skills (if you're in the UK you'll understand this strange distinction). Anyway I picked an open source project (in perl) to try my hand at, as well as hopefully breaking the M$ monopoly at work. All the modules use a particular way of assigning strings to variables as shown below:

$_[0] =~ s/%HTTP_HOST%/&handleEnvVariable('HTTP_HOST')/ge; $_[0] =~ s/%REMOTE_ADDR%/&handleEnvVariable('REMOTE_ADDR')/ge; $_[0] =~ s/%REMOTE_PORT%/&handleEnvVariable('REMOTE_PORT')/ge; $_[0] =~ s/%REMOTE_USER%/&handleEnvVariable('REMOTE_USER')/ge;

As a neophyte it initially looks like the same reference is being sequentially updated with new substitutions and it'll simply end up with the last value. Wierd but having read perlsub and the following line: "In particular, if an element $_[0] is updated, the corresponding argument is updated (or an error occurs if it is not updatable)." it seems to me that $_[0] is being used as some kind of dynamic referencing to change each of the s/// substitutions being done on the right hand side one after the other without having to use numerous user defined variable simply once (i.e. it is a shortcut for lazy coders - and I do know that laziness is a virtue). Is this correct?

Also: the =~ operator. How is binding (=~) different from assignment (=, ==, or eq)?

Replies are listed 'Best First'.
Re: What are multiple $_[0] =~ operations doing?
by davido (Cardinal) on Apr 01, 2005 at 09:44 UTC

    Let me answer your last question first.

    The =~ binding operator binds the scalar on its left with the regexp (or transliteration) operator on the right, so that the regexp operator may examine and act upon the scalar on the left.

    The = assignment operator assigns the value of the expression on the right hand side to the variable on the left.

    The == equality operator evaluates the numeric equality of the items to its left and right. It has nothing to do with assignment.

    Now for your first question:

    $_[0] is a sort of alias to the first argument passed to the enclosing subroutine. In fact, perlsyn states, "The array @_ is a local array, but its elements are aliases for the actual scalar parameters." As the documentation also states, modifying $_[0] will attempt to modify the value held in the variable passed in the sub's arg list. For example:

    my $value = 100; changeit( $value ); print "$value\n"; sub changeit { $_[0]++; } __OUTPUT__ 101

    As you can see, incrementing $_[0] increments $value too (because they're basically the same thing).

    Now as the docs say, if you attempt to modify the value of an unmodifiable scalar, you get an error. For example:

    changeit( 100 ); sub changeit { $_[0]++; }

    If you run this, you'll get an error. This is because $_[0] is aliased to a value, not a variable, in this case, and you can't change a hard-coded value. 100 cannot be 101 :)

    As a matter of maintainable style, it's often discouraged to allow your subroutines to modify the values of their parameters. But as with most things Perlish, this rule of thumb only applies where convenient. After all, chomp modifies its parameter list. So does chop.

    I hope this helps.


    Dave

      As a matter of maintainable style, it's often discouraged to allow your subroutines to modify the values of their parameters. But as with most things Perlish, this rule of thumb only applies where convenient. After all, chomp modifies its parameter list. So does chop.

      I'd like to add that this style recommendation will probably be affected by your background (or the background of the senior people in your department) (as davido said, 'where convenient'). For those with a C background, it make perfect sense to modify the arguments, as they can only return one value from a function. So it's common practice to have the function return a success or failure flag, and modify what was passed in. (it's part of the whole issue if you're going to pass by value, or pass by reference), as if you pass by value, there's no chance to modify the values. PL/SQL has a similar concept called 'in/out' parameters which work like pass by reference.

      There is, however, the expectation in almost all languages that you're not going to modify the references that were passed in, unless that was part of the plan, and it's well documented. The local style guide might have recommendations on function naming, so it's readily apparent to people that the function does modifications on its args. The following would be an example of bad code:

      sub sum { my $list = shift; my $sum = 0; while (my $i = shift @$list) { $sum += $i; } return $sum; } my @values = qw(1 2 3 4 5 6 7); print sum(\@values), " ; '@values'\n";

      You should always be suspicious when there's only one argument to the function, and it's an arrayref, not an array ... there might be some time savings by not pushing everything to the stack, but it introduces the possibility of someone messing with your values. Two or more arrayrefs isn't so much a sign to worry, as arrayrefs are the only way to pass more than one array to a function.

      Depending on the audience, something such as the following might be considered acceptable:

      sub add_two { my $arrayref = shift; $_ += 2 foreach @$arrayref; return 1; } my @values = qw( 1 2 3 4 5 6 7 ); print add_two(\@values), " ; '@values'\n";

      Okay, that might be a bad example. You might do it when you wanted to make sure that the original wasn't left around, though. Such as:

      sub sanitize_strings { my $arrayref = shift; s/[^a-zA-Z0-9_\-]/_/g foreach @$arrayref; return 1; } my @strings = qw( abc 123 a&21 j**!k ); print sanitize_strings(\@strings), " ; '@strings'\n";
Re: What are multiple $_[0] =~ operations doing?
by ambs (Pilgrim) on Apr 01, 2005 at 09:39 UTC
    $_[0] is the first argument to the function. The binding operator (=~) followed by s/// performs substitutions. The ge at the end says to do the subtitutions how many time it cans ("g") and that the subtitution value should be evaluated ("e").

    So, you are substituting "%HTTP_HOST%" by the result of calling the "handleEnvVariable" function.

    If you don't like it that way, you can always:

    $_[0] =~ s/%(HTTP_HOST|REMOTE_ADDR|REMOTE_PORT|REMOTE_USER)%/handleEnv +Variable($1)/ge;
    or (but be careful, it can not work in case there are other variables)
    $_[0] =~ s/%([A-Z_]+)%/handleEnvVariable($1)/ge;

    Alberto Simões

Re: What are multiple $_[0] =~ operations doing?
by Enlil (Parson) on Apr 01, 2005 at 09:49 UTC
    As a neophyte it initially looks like the same reference is being sequentially updated with new substitutions and it'll simply end up with the last value.

    well sort of. What happens is that all the occurances of the string %HTTP_HOST% will be replaced with whatever is returned from &handleEnvVariable('HTTP_HOST') in $_[0], and THEN all the occurances of the string '%REMOTE_ADDR%' will be replaced with whatever is returned by &handleEnvVariable('REMOTE_ADDR')... and so on. . .

    So in each line the string is modified by substitutions if the pattern exists in the line.

    It could be shortened down to:

    $_[0] =~ s/% (HTTP_HOST| (?:REMOTE_ (?:ADDR|PORT|USER)) %/&handleEnvVariable($1)/gex;
    You might want to take a look at perlop(Regexp Quote-Like Operators) and perlre for more info.

    -enlil

Re: What are multiple $_[0] =~ operations doing?
by cog (Parson) on Apr 01, 2005 at 09:43 UTC
    some kind of dynamic referencing

    Take the following code, for instance:

    my @a = (1, 2, 3); add(@a); sub add { for (@_) { $_++; } }

    Now try printing @a; it contains (2, 3, 4), because the add() function acted directly upon it's arguments.

    Likewise, your statements are changing the first input parameter directly, by performing substitutions upon $_[0], which is the first value of @_.

Re: What are multiple $_[0] =~ operations doing?
by polettix (Vicar) on Apr 01, 2005 at 10:14 UTC
    The binding operator - er - binds an operation to be performed to a particular variable instead of $_. I did find this rather confusing at the beginning, because I used to read this as some kind of assignment. See this as if you're assigning "an operation" to the variable.

    As for the use of $_[0], Perl passes variables by reference instead of value, which means that you can modify (if possible) the actual variables that are passed to a function. Thus:

    sub double_it { $_[0] *= 2; } my $x = 1; print "\$x is $x\n"; double_it($x); print "\$x now is $x\n";
    will multiply $x by two, ending up with the following output:
    $x is 1 $x now is 2
    The piece of code you proposed seems some implementation of a templating system; each substitution command replaces occurrences of some placeholder (like %HTTP_HOST%) with the value given by a call to the handleEnvVariable sub with the given parameter (a call to a function is allowed inside the substitution part due to the "e" modifier).

    Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")

    Don't fool yourself.