http://www.perlmonks.org?node_id=24822

NAME

subtlety - the subtle things about perl


QUIRKS

The following is a list of some of the things that have surprised me about Perl.


+shift;

This feature of Perl represented the first (and only?) time Keith <<A HREF="mailto:keith@lbox.org>">keith@lbox.org> stumped me w/ a Perl question. The most common use of shift is to take parameters off of @_ and put them into scalars.

sub foo { my $self = shift; my $hash_ref = shift;

# and so on... }

What if you wanted to have a local copy of the hash instead of working directly with the hash reference? You can't say:

    my %hash = %{ shift };

The reason for this is because perl doesn't know whether you mean %shift or whether you're trying to execute the built-in shift. To remedy this (w/o using more variables) you can say:

    my %hash = %{ +shift };

Now it works. Why?! The unary + operator is a hint to the perl interpreter that it's not a variable name (because variable names cannot start w/ a ``+''). shift can still be a built-in at this point so that's what the perl interpreter treats it as. By the time %{ } comes around, shift has returned a a hash reference that can be cast into a hash. The + binds tighter than the %{ }.

Another (less obfuscated) way of doing this is:

    my %hash = %{ shift() };

The () binds tighter than the %{ }, so this achieves the same thing.


Perl built-ins are not subroutines.

I wrote a script called lchtml that would lowercase HTML tags and attributes while leaving the rest of the content alone. I thought it would be nice to be able to uppercase these things as well, so my first thought was to make it so that the routines that did the text transformation could take a subroutine reference as a parameter. When I wanted to lowercase, I'd give it \&lc and when I wanted to uppercase, I'd give it \&uc.

It didn't work. The built-in functions lc and uc (or any other built-in for that matter) aren't really subroutines. You can't take their reference. This was the first time I felt deep disappointment in Perl. I accept the fact that Perl is not 'orthogonal', but this was a little too far.

My work-around was to create wrapper subroutines that looked like:

sub my_lc { lc +shift; } sub my_uc { uc +shift; }

and pass around references to these wrapper subroutines. It made me feel dirty.


Sub-expression matches in array context.

Note that the following printf is receiving its parameters from the sub-expressions in the regular expression.

    printf("#%02x%02x%02x %s\n", /(\d+)\s+(\d+)\s+(\d+)\s+(.*)/);

In array context, the sub-expressions ($1, $2, $3, ...) will be returned as a list.


The difference between <STDIN> and <>.

One would think that <> would just read from STDIN, but what if there's nothing to be read from STDIN? Did you know <> will try to iterate over @ARGV, instead? It'll try to open each thing in @ARGV as if it were a file, and then try to read from that file. Make a script that does:

#!/usr/bin/perl while (<>) { print "$_\n"; }

and then invoke it like:

$ ./script a b c Can't open a: No such file or directory Can't open b: No such file or directory Can't open c: No such file or directory

If any of those files (a, b, or c) were to exist, it would have printed them to STDOUT. If the script had said <STDIN>, @ARGV would have been left alone.


\ is a list operator

Usually, one uses \ in scalar context to get the reference of one thing, but you can also use it in array context.

    @ref = \($a, $b, $c);

will take the reference of $a, $b, and $c and put them in @ref. If you ever need to take the reference of a bunch of things at once, this might be the idiom to use.


Mixing long and short options with Getopt::Long

This is documented under the ``Aliases and Abbreviations'' section of Getopt::Long's man page, but if you're not reading carefully, you will miss it. You can specify alternatives to an option by separating them with the pipe character |. Suppose you wanted people to be able to specify a size on the command line. If you want them to have the option to use either ``--size'' or ``-s''. To do this you can say:

my %opt; GetOptions(\%opt, "size|s=i");


SEE ALSO

Perl Monks -- you pick up on a lot of weird things perl can do.

http://perlmonks.org

Replies are listed 'Best First'.
&lt;&gt; and @ARGV
by chip (Curate) on Jul 28, 2000 at 15:23 UTC
    If the connection between <> and @ARGV caught you by surprise, I'm afraid you haven't been reading very carefully.

    And <> doesn't read from the files in @ARGV when there's nothing on stdin. It's the other way around. It reads from stdin only when @ARGV is empty (or contains "-").

        -- Chip Salzenberg, Free-Floating Agent of Chaos

      Well, I went and looked at the Camel book again, and (to my surprise) the behaviour was documented. So yes -- I was not reading carefully, but...

      I still think I wasn't out of my mind to be surprised. The only things I've used the angled brackets for are reading from file handles and globbing. The connection between <> and @ARGV doesn't seem obvious to me. It's a special-case behaviour.

      I've only been w/ Perl for 1.5 years. Give me time. :)

        Sorry if I came off as being snippy. It's just that the use of <> and @ARGV is so common in just about every text-oriented Perl script ever written.... I was just surprised that your experience was so atypical. It's as if C programmer were surprised to find that if you're in a while, and you do a return, it gets you out. (BTW, that was a real question from a real former cow orker. He's probably writing VB these days.)

            -- Chip Salzenberg, Free-Floating Agent of Chaos

RE: Subtle Quirks
by autark (Friar) on Jul 28, 2000 at 19:28 UTC
    Well, perl has many subtleties - if you dig long enough. The most irritating thing I have come over in perl is the difference between arrays and lists:
    $_ = "ace"; my($a, $b, $c) = (/a/, /b/, /c/);
    Intuitively one would expect the above code to be similar to my($a, $b, $c) = (1, undef, 1). But this is not the case. // returns the empty list when it fails - ouch. That means the result will be my($a, $b, $c) = (1, (), 1). Which in turn means that $a = 1, $b = 1 and $c = undef

    I have learnt to avoid such situations in my own code, but often I find modules that suprises me still. I belivieve CGI's param method returns the empty list if it doesn't find a parameter:

    my %foo = (BAR => $cgi->param('bar'), BAZ => $cgi->param('baz'), ZOT => $cgi->param('zot'), );
    If the parameters 'bar' and 'baz' was missing, your %foo would look like:
    %foo = (BAR => "BAZ",
            ZOT => "zots value");
    

    Another thing which is kind of icky is the my $foo if 0 construct. It is if IIRC documented as something you shouldn't rely on, but I have actually found good use for this once :-)

    A comment to your complaint about the builtins, if you want your own lc function, but still call the perl builtin lc:

    use subs qw|lc|; sub lc { lc shift } print lc("BAR");
    This will not work, because lc now is recursive. So how do you access the builtin lc now ? There is actually a package named CORE:: which provides access to all builtins functions:
    use subs qw|lc|; sub lc { CORE::lc shift } print lc("BAR");
    Autark
      That's why I would probably not write this:
      my %foo = (BAR => $cgi->param('bar'), BAZ => $cgi->param('baz'), ZOT => $cgi->param('zot'), );
      But rather this:
      my %foo; $foo{$_} = param($_) for qw(bar baz zot);
      It's more compact and maintainable.

      -- Randal L. Schwartz, Perl hacker

      Autark, are you an Anteater? Beppu-san and I attended UC Irvine together so your "zot" made me wonder. If so, did you graduate ICS? Hmmm. Beppu-san and I were unenchanted with the puter department so we didn't finish. Maybe I'll go back and grad someday. I doubt Beppu-san would ever want to go back but maybe I'm wrong. TTFN.

      -PipTigger

      p.s. Initiate Nail Removal Immediately!
        | Autark, are you an Anteater?

        No, I don't eat ants. Perhaps if they were covered in chocolate - but then again, maybe not.

        #!/usr/bin/perl -l require 5.006; sub AUTOLOAD { &{(map{eval"sub{$_}"}qw{bless\$zot ${+shift}.=chr(+pop^ +32) ${+pop} print$^T})[index+(split /::/, $AUTOLOAD)[1], "T"]} +} my $zot; { my @zot = (2,97,85,84,65,82,75,2,0,78,69,0,2,97,78,84,69,65,84,69, +82,2); for(tie our $zot, (); @zot && ($zot = shift @zot);) { $# = $zot } }
        Btw, what is the connection between metavariables and ants ?

        Autark - not an ant, nor an anteater.

Of course &lc doesn't work
by chip (Curate) on Jul 28, 2000 at 15:20 UTC
    After all, &lc is how you tell Perl that you want to call a function named "lc", as opposed to the default behavior attached to that name ... namely, the lc operator.

        -- Chip Salzenberg, Free-Floating Agent of Chaos

      I didn't try &lc. I tried \&lc. (Is the backslash showing up?) I tried to take the reference of lc, because I thought (mistakenly) that built-ins and subroutines could be treated the same way.
        Yes, the backslash came through. But taking a ref to a sub only works if there's a sub there to take a ref to. The idea of \&lc reminds me of that famous description of Oakland: ``There's no `there' there.''

            -- Chip Salzenberg, Free-Floating Agent of Chaos

RE: Subtle Quirks
by turnstep (Parson) on Jul 29, 2000 at 07:57 UTC
    I just wanted to say that I think that colored "See Also" section at the bottom was probably the first font tweaking I've seen on this site that didn't make me look for the o-- button and msg vroom about removing FONT tags. Nice work!

      Thank you.

      I used a perl script called delirium to do that. It's a text filter that works on STDIN/STDOUT. It also uses HTML::Parser so it won't mess up your tags (just your content ;). I've put it up here at perlmonks.org, and it's also in the CVS repository for Free Software at my werk. (Just hit the link below).

      Be glad that I didn't make the whole article delirious.

      Obviously written by a Sandman fan :)

      If I can suggest one small tweak that would make it look even more like the way that Delirium speaks, perhaps you could get it to add in <sup> and <sub> tags at random.

      print "Just another Gaiman fan\n";

      --
      <http://www.dave.org.uk>

      European Perl Conference - Sept 22/24 2000, ICA, London
      <http://www.yapc.org/Europe/>