I like spinach.

Plowing through the copy of Damian Conway's "Perl Best Practices" I got at OSCON, I've been pleased to find that nearly all of the recommendations are both healthy and palatable. In places where my code doesn't adhere, I can generally see the advantages of what he advocates, and I'll be willing to adapt for both my own good and the sake of convergence. Mmmm, spinach!

There is one section, though, that I'm having a lot of trouble getting down: the recommended naming conventions for variables, specifically for references and hashes. Damian advocates tacking '_ref' onto the end of any scalar that contains a reference...

... The primary rationale for this is that if you use $vegetables_ref->{spinach} instead of $vegetables->{spinach}, there's no chance that if you accidentally leave off the dereferencing arrow you will access a %vegetables array that happens to exist in the same scope: $vegetables{spinach}

However, it seems to be accepted wisdom that having scalars, arrays, hashes, etc in the same scope with the same identifier is a really bad idea. I don't append _ref to my reference variables and yet I cannot recall ever having encountered the problem he describes, because I never count on the sigil alone to differentiate program elements. Of course I leave off the dereferencing arrow all the time, but that's a fatal compile-time error that is trivial to debug.

The thing I find most objectionable about this recommendation is its interaction with another, much more important guideline: use descriptive variable names, like $cancelled_transaction_number instead of $i (to quote Damian's own example). Such names can get pretty long, and appending _ref to them adds noise, possibly pushing a line over 78 characters and necessitating a line break. The code becomes less readable, and less of it fits on screen -- all for the sake of solving a problem that's only a problem if you're doing something asinine to begin with.

Damian also recommends naming hashes in the singular, the idea being that individual accesses seem more natural: $vegetable{spinach}. It's weird to me that a collection of key-value pairs representing, say, %foo_file_defaults ought to be named %foo_file_default under his guidelines, but hey, I'm a team player and I could adapt if someone said "do it this way". Still, the guideline is internally inconsistent: since he recommends that arrays be named in the plural, individual accesses for array elements still use the plural: $stories->[0]

A long time ago I came across the P5EE style guidelines, which spell out variable-naming conventions which avoid all these problems. Not sure where that project's at these days, but the guidelines have served me well. Here's the relevant section, and a link:

Arrays and hashes should be plural nouns, whether as regular arrays and hashes or array and hash references. Do not name references with "ref" or the data type in the name. @stories = (1, 2, 3); # right $comment_ref = [4, 5, 6]; # wrong $comments = [4, 5, 6]; # right $comment = $comments->[0]; # right


Update: A missing dereference operator is only a compile-time error under use strict -- otherwise it's a runtime error. Thanktalus, Tanktalus.

Marvin Humphrey
Rectangular Research

Replies are listed 'Best First'.
Re: Perl Best Practices for naming variables
by hv (Parson) on Aug 07, 2005 at 00:05 UTC

    Appending _ref doesn't seem particularly sensible to me, though I haven't got my hands on the book yet to see the full rationale. But my personal style has been evolving to use fewer and fewer real arrays and hashes, and the potential for mishap disappears if everything is a ref (or at least a scalar).

    As for plural names, I'm pretty inconsistent: I can argue it both ways in my head, and frequently do. If $item->[$index] looks right then if (@$item) { ... } looks wrong, and vice versa when the name is plural.

    The only sense I can see in picking a different convention for arrays versus hashes, though, is an assumption about the style of use: that for a hash the primary use will be to dereference an element, while for an array the primary use will be to act on the collection as a whole. But even if true, I'm not convinced it's necessarily a valid distinction: the question is whether you think of it as a collection, and whether in that respect you think of arrays and hashes differently - and if you do, I think you probably shouldn't. :)

    I think though that the intention of the book is to give a simple and consistent set of rules to people who are not inclined, or able, to go through the same tortuous thinking process for themselves. For those that are so inclined, it is rather food for thought, and another data set to add to the collection (sic).

    Note that changes due in perl6 may alter the balance substantially - I've fallen behind with the design, but I think it was moving in the direction of having @array and %hash be no more than syntactic sugar for a ref in a real scalar; similarly, Larry was talking about an array as "just a hash with a constraint on the keys", implying that all the rest is just under-the-hood optimisation (though I'm not sure if that thought ended up affecting the syntax at all).


      For all collaborative coding efforts, and CPAN code especially, there are benefits to adhering to standard practices: other people find it easier to grok, hack, debug, or contribute. The first priority is self-consistency, but the closer the ruleset is to a ruleset everyone is already familiar with, the greater the advantage.

      I'll certainly reserve the right to redefine a rule or two, but for the most part I really like (and already practice many of) Damian's recommendations, and would be happy to see them become a de-facto standard, as Andy Lester has suggested they may. It's worth it to me to struggle with the ones that strike me as odd. Spinach for thought.

      Marvin Humphrey
      Rectangular Research

Re: Perl Best Practices for naming variables
by tilly (Archbishop) on Aug 07, 2005 at 05:44 UTC
    I have no strong preferences on _ref in those names. The issue that he mentions is not a problem for me because most of my variables have a small scope, so I'm unlikely to have 2 with the same name in scope. Lessening it even more, most of my hashes are references to hashes, so it is very unlikely that there is a hash or array to conflict with.

    I also have no strong preferences on giving singular or plural names to arrays. Sometimes one looks right, sometimes the other.

    I do have a strong preference that hashes get singular names, though.

    The single most useful comment that I've seen on using hashes is, Think of a hash lookup as "of". That is, $age{'Sam'} should be read "age of Sam". Or, if you need it to be really unambiguous, $age_by_name{'Sam'} can be read "Age of whoever is named Sam."

    This little recommendation fits every idiomatic use of hashes that readily comes to mind. When I was just starting Perl, it made it easy for me to spot where and when a hash would be useful, and decide what it should be called. With the benefit of experience I know of no better way to figure that out.

    However that little linguistic principle that I treasure is ruined if you give hashes singularplural names. (Unless the values are array refs, in which case I would give it a plural name without even pausing to think about it.)

    Update: Changed singular to plural per private note from hv.

Re: Perl Best Practices for naming variables
by Tanktalus (Canon) on Aug 07, 2005 at 03:36 UTC

    The _ref suffix looks like a Damianised version of Hungarian Notation. Not the popular version, but the original. The purpose seems to be to help ensure that just by looking at the code, you can tell if it's correct or not. For example, if you have adhered to this naming convention, you could tell at a glance which of the following are correct, and which would result in a runtime error, should that codepath be excersised:

    $comments[0] = 'foo'; # 1 $comments_ref[0] = 'foo'; # 2 $comments->[0] = 'foo'; # 3 $comments_ref->[0] = 'foo'; # 4
    Obviously, then, 1 and 4 are good, while 2 and 3 will result in runtime errors. And that's without seeing any other code. This means you can look at each line in isolation and be able to intuitively understand its correctness, or lack thereof. That's goodness.

    Now, in the original Hungarian notation, these are to be prefixes. But where you put the notation is less relevant than having a consistant, well-defined notation that translates runtime errors into things that are obvious from a simple code-review.

      Haha, great etymological humor in that Wikipedia link.

      A point of clarification... this fails at compile-time:

      $comments_ref[0]   = 'foo';  # 2

      ... while this fails at runtime (assuming that $comments doesn't contain a ref, as implied by the lack of HungDamian suffix):

      $comments->[0]     = 'foo';  # 3

      I hadn't addressed the situation where a spurious dereferencing operator causes problems, but yes, now that I think about it, I've done that every once in a while. Depending on how rarely the code branch containing the bogus arrow gets accessed, that might produce an unpleasant surprise at an inopportune moment.

      Fortunately, thanks to Data::Alias, I can have my spinach and eat it too. I'm persuaded. I'll start using _ref.

      But those hash names are going to stay plural for now.


      Marvin Humphrey
      Rectangular Research

        $ perl -c -e ' my $c_ref; $c_ref[0] = "foo"' -e syntax OK

        The moral is that you should always use warnings and strict. But, even if you don't (or can't), $c_ref[0] is still obviously wrong if you're looking at code following the HungDamian naming convention.

        (Personally, I think we should come up with a better name than that - Damian may like this one too much. And his publicist may not ;-})

        PS - I'm not attempting to convert anyone to this style of naming. I don't use it myself, and I'm not sure I'm going to start, either. Someone asked why Damian would have suggested this, and I answered. Personally, if I were to start, it'd be using an 'r' prefix rather than a '_ref' suffix.

Re: Perl Best Practices for naming variables
by chester (Hermit) on Aug 06, 2005 at 22:39 UTC
    Even the distinction between $comment and $comments isn't very great, in my opinion. For reading code more than writing, that is. If using singular/plural to make the distinction like that, I'd probably make some further distinction as well, if two variable names were that similar. If all else failed maybe $a_comment (ugh, that's ugly) or $single_comment or $final_comment or $comment_about_dogs or who knows what.

    So far as _ref, I guess the deciding factor is whether it's more important to easily tell references from non-references, or arrays/hashes from scalars.

    If you see $var_ref, then first you can immediately tell "It's a reference!" and then you'll have determine whether it's an array ref, hash ref, or scalar ref or whatever.

    On the other hand if you're using plural/singular to make the distinction, then if you see $items you first know "It's an array!", and then you're faced with figuring out (or remembering) if it's an array item or array reference. $items[1] vs. $items->[1] aren't so easy to distinguish at a glance, especially if it's buried in the middle of a bunch of other line noise.

    I can easily see the argument going either way. I'd personally lean towards using _ref because I find myself mixing up hashrefs with hashes pretty often while writing code. I believe Damian mentions in the book that it's helpful if typing _ref-> as a unit becomes a habit, and it seems to work for me.

      # before Damian's book I'd have used EOREPLY my $reply_to_chester = <<'END_REPLY_TO_CHESTER';

      Exactly right, that's what Damian recommends. There's obviously someone besides him who's made it work.

      I suppose I've just solved the problem he's looking to address by never using identical identifiers for different purposes. What's the advantage to me for adding those extra four characters? Now I can use %option, @option, $option, and &option in the same scope and, uh, not get them mixed up? ;) On the flip side, what's the penalty for typing$items[1] when you meant $items->[1]? No more than a missing semi-colon, or other simple syntax error. Unless your program takes a long time to compile, no big deal: run, tweak, run, tweak...

      I might not find this constraint so onerous if I hadn't produced a lot of code that goes up to 78 characters.

      ... (time passes while Marvin peruses book) ...

      It looks like the answer is, for Perl 5.8.1 onwards, the Data::Alias module. The main situation where I'd need _ref is when pulling elements out of a large and complex object for manipulation in a method. Here's what I've been doing:

      sub do_stuff { my $self = shift; my $items = $self->{items}; # Bear with me and assume I'll need $items again... for my $item (@$items) { # etc...
      With the Data::Alias module, I can do this instead:
      sub do_stuff { my $self = shift; alias my @items = @{ $self->{items} }; for my $item (@items) { # etc...

      That's probably faster. It's sure cleaner. I like. And as a bonus, now when I use _ref, it will imply something useful -- like, "this is a giant array and I want to pass it by reference".

      Tasty spinach, yo,

      Marvin Humphrey
      Rectangular Research

Re: Perl Best Practices for naming variables
by pg (Canon) on Aug 07, 2005 at 06:11 UTC

    Other than use plural for collections (array/hash), it is also nice if you can name the "iterator" wisely:

    for my $friend (@friends) { #the relationship between $friend and @fri +ends are visually clear ... }


    for my $person (keys(%phone_numbers)) { #$person clearly tells the mea +ning of the hash key $phone_numbers{$person}... }
Re: Perl Best Practices for naming variables
by borisz (Canon) on Aug 07, 2005 at 11:11 UTC
    Without reading the book.
    I refuse to append _ref to my refs. Since a real program is much more readable with shorter meaningfull names.

      There sure is more than one way to name your references.

      scnr ;-), Sören

        There sure is more than one way to name your references.

        $rsReally = \"yes really"; $raReally = [qw"yes really"]; $rhReally = {qw"yes really"};
Re: Perl Best Practices for naming variables
by japhy (Canon) on Aug 08, 2005 at 14:37 UTC
    Hey, I like Damian, and he has some great ideas, and has produced some amazing modules and applications with Perl. But I'm not going to write my code so it looks exactly like Damian's. No offense, nothing personal, it's just that I have a style I'm comfortable and doesn't do me any grief. While his book offers some very intelligent and useful best practices, you needn't cling to them as though you'll be left behind or shunned if you don't.

    Damian finds that naming his references in a self-indicative way helps him (and readers of his code, no doubt) identify them and reduce coding mistakes. Bully for him! But I don't need to. And you sound like you don't either.

    I'm not trying to belittle Damian or his Best Practices book; all I'm trying to say is that there is more than one way to do it, and Damian is explaining a couple of those ways. Absorb it, adopt what you like, recognize what you don't like, and deal with it.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: Perl Best Practices for naming variables
by mascip (Pilgrim) on Jun 22, 2012 at 16:25 UTC

    That's an important point, and subjective too.
    I like my code to look as much as possible like english : i like when i just read it and it says what it does. Without having to remember that some letters are just here to give technical information, so "you have to try to imagine that they are not there."

    I tried to use _ref. Then i tried using _hash and _list, but it didn't work for me : the code becomes less readable. I don't like:

    Sam is not a hash.

    Then i tried prefixes instead of suffixes, and i liked it better : i know that $h_ corresponds to %, and it doesn't get in the way so much. The beginning of a variable name feels (to me) like the right place to say its type.

    feels better.
    I even thought of using capital letters :
    $H_age_of->{Sam} $L_items->[2]
    I think i like these.
    It doesn't make the "forgetting an arrow mistake" as obvious as suffixes, but it's better than nothing, and it doesn't get in the way as much.

    Thank you for speaking about Data::Alias. I will give it a go. The % and @ symbols make for more readable code, I was wondering if it was possible to keep them. Every little help is good to take.

    ~ ~ ~

    Another thing i have remarked, is that i don't always know what there is in a hash, or what keys i'm supposed to use with it. For example :

    my %data_from; # File? Measurement? Experiment? etc. my %height_of; # Furniture? Object? Person? etc. my %results_list_for; # the keys could be almost anything
    One way to deal with this is to make names more explicit. Or to add this information at the end of the hash, like this for example :
    $data_from_measurement{measurement->label()} $height_of_furniture{$old_furniture} $results_list_for_file{$file_path}
    But it feels quite heavy to me.
    What i like about it, though, is that it tells me which type of key is expected in the hash, and what the hash could contain, too. It does tell me a lot. I might use it in some occasions...

    Maybe that a different style is better in different contexts... Spinach for thoughts.

    It take lots of effort to make something easy to read, and people often don't notice it, but it does help them a lot.

A reply falls below the community's threshold of quality. You may see it by logging in.