http://www.perlmonks.org?node_id=585773

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks

Are there any guidelines on the naming convention to be followed for variables, subroutines, modules,etc..? If so, could someone point to some links or tutorials? ( This is the only related link I found under tutorials. )

Also, what is the best way for variable declarations? Should the variables be declared as and when they are required? or should they be declared up front and then used?

Variables declared as they are required:- ( I notice that this is widely used)

#!/usr/bin/perl use strict; use warnings; my ( $path, $file ) = @ARGV; die "Usage: $0 search/path data.file\n" unless ( -d $path and -f $file ); my ( $inode, $nlinks ) = ( stat _ )[1,3]; die "$file has no hard links\n" if $nlinks == 1; my ( $chld, $nfound, @found ); .....

Variables declared upfront:- (Anything wrong with this approach??)

#!/usr/bin/perl use strict; use warnings; my ( $path, $file, $inode, $nlinks ); ( $path, $file ) = @ARGV; die "Usage: $0 search/path data.file\n" unless ( -d $path and -f $file ); ( $inode, $nlinks ) = ( stat _ )[1,3]; die "$file has no hard links\n" if $nlinks == 1; .....

Replies are listed 'Best First'.
Re: Naming convention for variables, subroutines, modules, etc.. and variable declaration
by brian_d_foy (Abbot) on Nov 23, 2006 at 21:03 UTC

    Naming

    Damian discusses some techniques in Perl Best Practices. Larry has some advice in perlstyle.

    There are various naming conventions you can use, and don't restrict yourself to Perlville to find one that makes sense for you and what you are doing.

    • You probably don't want to completely invent your own style because good programmers will already know the basic conventions of the major styles out there. Small deviations aren't that bad, but something completely wacky will distract the code reader from the really important stuff.
    • DOS was a long time ago. You can use more than eight characters. What did vowels ever do to you, anyway? :)
    • Remember that names that make sense to you only do so because you picked them. Abbreviations, for example, might not make sense to someone who doesn't speak your language as their first language.
    • Whatever you do, be consistent. Make other people who edit the code be not only self-consistent, but project consistent too.
    • Use the same names (because hey, they're all lexicals, right?) to mean the same thing everywhere in the code, even if it's in different files that have nothing to do with each other. For instance, if you use $email to mean the email address, don't call the same idea $address or $to somewhere else.
    • Make similar ideas have similar names, such as $from_email<code>, <code>$to_email; or always appending _dbh to database handle names or C_fh and so on. This isn't just about things that go together (you probably want to put those in a hash :), but marking the same idea the same way each time so code readers can take what they learned from your code elsewhere and use it to understand new code.
    • Spell out words for important variables ($count instead of $cnt), although very short term variables or widely used names ($i for an index, $fh for a filehandle) may break this rule.
    • When interacting with a module, I try to use the same variables from its examples, such as $dbh, $sth, and so on from DBI.
    • I tend to use all lowercase for normal variable names, initial caps for globals (package vars), and all uppercase for constants.

    Other monks probably will chime in with more advice, and I'm off to eat thanksgiving turkey or I'd think about this more. Good luck :)

    Declarations

    I tend to define the lexical variables as late as possible and as close to their use as possible, as in your first example. This limits the effect of the variable to exactly the scope I want to use it. If I need it in some other scope, I just declare it again.

    Declaring variables upfront tends to give variables a larger scope than they really need. You might do that as a way to start introducing lexical variables into a script that only used package variables, but only while you waited for the time to convert them to your first form.

    --
    brian d foy <brian@stonehenge.com>
    Subscribe to The Perl Review
Re: Naming convention for variables, subroutines, modules, etc.. and variable declaration
by shmem (Chancellor) on Nov 23, 2006 at 23:23 UTC
    Whatever convention you use - the first rule is: be consistent. Or is it "aways decide for readability"? Both are first.

    Apart from that, there are many conventions and styles, and each comes with its reasoning.

    CamelCase vs. underline_variables, hungarian notation (I mean the original one) etc... pick what best suits you.

    For programming perl, in what concerns variables, apart from perlstyle I personally agree with points 20-22 of Abigail's coding guidelines, which - as per your example - I would extend as

    • declare variables early in the scope in which they are used
    • declare variables as close as possible to their first use in their scope
    • resolve contradictions of these conflicting rules by looking at the importance of those variables, and their frequency and location of usage

    Which means that e.g. if there are variables that are used only in a particular fragment of their scope, declare them at the beginning of that fragment. If they are important for what the block (scope, file) is all about, declare them up front.

    If there's a great distance (measured in screen pages) between the declaration of variables and their use, add a short comment indicating the first occurence of their use.

    --shmem

    update: added readability to first rule

    _($_=" "x(1<<5)."?\n".q·/)Oo.  G°\        /
                                  /\_¯/(q    /
    ----------------------------  \__(m.====·.(_("always off the crowd"))."·
    ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}
Re: Naming convention for variables, subroutines, modules, etc.. and variable declaration
by GrandFather (Saint) on Nov 23, 2006 at 20:25 UTC

    There are many naming conventions and naming styles. Basic styles are camelCase, unixalllowercase and underscore_seperated. unixalllowercase seems to be used less now (andadamngoodthingtoo). camelCase has the small advantage that iw uses fewer characters and the small disadvantage that it can be slightly harder to read if you are not acustomed to it compared with underscore_seperated.

    A number of conventions exist concerning use of case. BLOCK_UPPERCASE is pretty much reserved for file handles in Perl. MixedCase is generally ony used for camelCase identifiers. Sometimes an uppercase First_letter is used to denote a subroutine call. However, at the end of the day, adopt a convention and stick with it. Consistency is much more important than the particular style chosen.

    In the case of where to declare variables however the matter is much clearer. Declare them in as small a scope as you can so there is less chance that they will be misused and their intended use is more evident. Especially this is true if you initialise the variable where it is declared. Using strictures of course helps a lot with managing variables.

    Update: s/hyphen/underscore/g


    DWIM is Perl's answer to Gödel
      <nit>Your hyphens are underscores. This - is a hyphen.</nit>

      Cheers,

      JohnGG

      Declare them in as small a scope as you can so there is less chance that they will be misused and their intended use is more evident. Especially this is true if you initialise the variable where it is declared.

      The "use the tightest scoping possible" rule is very common these days (and therefore, probably what you should do), though just for the sake of argument...

      If you use subroutine scope (always declare your variables near the beginning of your subs), you can make the main body of the code a little cleaner by eliminating the embedded "my"s scattered around. You should still have the benefit of reasonably tight lexical scoping... provided that your subroutines are short and to the point. If you need tighter scoping for some reason, maybe what you really need is more and shorter subs, eh?

      I suspect that the main reason that most of us use "tight scoping" has to do with simple laziness: when we want to create a variable, we don't feel like skipping back up to a declaration section to do it. But if it improves readability a little, isn't it worth doing?

      I also might make the point that perl's rules about where you're allowed to insert a "my" are a little hard to grasp.
      At this point, you probably know that this works:

      for my $item (@items) { ... }

      Do you know if this works?
      chomp( my @lines = <$fh> );

        Improves readability by using "subroutine" scope? Far, far, FAR from it. Tightest scope possible improves readability because I don't have to go back to find out what else has played with this variable. It also means that when I figure out I don't need a variable any longer, I'm more likely to remove it - I've seen functions start with "my (<20 variables here>)" - and I notice one that looks out of place, and then can't find it being used anymore. I don't see nearly as many unused variables being declared when following tightest-scope-possible rules.

        There is a reason why C++ started to allow declarations of local variables anywhere in a block, and not just at the top as its predecessor, C, did. Because it's more readable. And maintainable. (Well, that, and delayed construction of variables on the stack only made sense - forcing them all to the top of the block would be impossible.) And now, C allows it, too.

        Mandatory declare-at-the-top semantics is going the way of the dodo. And for good reason. It was only ever there due to limitations of the compilers (allowing declarations anywhere is hard - especially in compiled languages with a stack). Now that even the compilers are being written in higher-level languages, with better automatic memory management, there's much less excuse in granting this type of flexibility for readability.

        As for your question - yes, I know that works. And just to convince myself that I was right, I wrote a quick program to prove it ;-) Perl is more generous with where you can put "my" than you may think. You probably didn't know you could use it for the lexical filehandle in the first place:

        #!/usr/bin/perl use strict; use warnings; open my $fh, '<', $0; chomp(my @lines = <$fh>); print "[$_]\n" for @lines;
        No lines starting with "my" - but, I do end up with both a lexical scalar (filehandle) and a lexical array.

      However, at the end of the day, adopt a convention and stick with it. Consistency is much more important than the particular style chosen.

      If consistency within an application is important, so, too is consistency of convention within a language. Therefore, there should be a single convention in use within Perl, for much the same reasons. The problem is that people can't agree on which convention to use, and there's always one more way to do it...

        Absolutely wrong for a language known for TIMTOWTDI

        -Lee
        "To be civilized is to deny one's nature."
Re: Naming convention for variables, subroutines, modules, etc.. and variable declaration
by rhesa (Vicar) on Nov 23, 2006 at 20:41 UTC
    In addition to perlstyle, pick up a copy of Perl Best Practices. It has solid advice on naming variables, functions etc., as well as variable scoping and a host of other good ideas.

    IMHO, variables should be declared where they are used, and in as small a scope as possible. I would recommend following your first example.

Re: Naming convention for variables, subroutines, modules, etc.. and variable declaration
by swampyankee (Parson) on Nov 24, 2006 at 02:26 UTC

    Consistency is quite important.

    My preference for naming just about anything is to make the names meaningful, e.g., use names like $sales_tax_rate for (duh!) sales tax rate, %population_by_city for a hash of city populations indexed by city, etc. There are no points for originality, nor are there any for cleverness.

    My preference for declarations is to do so at the beginning of their scoping unit. That is, if a variable ($sales_tax_rate) is used throughout a sub, I prefer to declare it near the top of the sub, even if it's not going to be used until near the end. Loop indices (foreach $i (0 .. 100){... are declared as needed. (Note that I consider $i a perfectly sensible name for a loop index; I find the suggestion to use names like $loop_index a bit too much).

    Above all, though, consistency, clarity, and simplicity. If it takes more than about 20 lines to explain a naming convention, it's probably too complex.

    Naming conventions I dislike are sTuDlyCaps, runonmultiwordnameswithnovisualbreaks, and Hungarian Notation.

    emc

    At that time [1909] the chief engineer was almost always the chief test pilot as well. That had the fortunate result of eliminating poor engineering early in aviation.

    —Igor Sikorsky, reported in AOPA Pilot magazine February 2003.
Re: Naming convention for variables, subroutines, modules, etc.. and variable declaration
by petdance (Parson) on Nov 24, 2006 at 04:06 UTC
    Buy a copy of Code Complete, 2nd edition, by Steve McConnell. Read it cover to cover. It will help you in your programming career now and forever.

    xoxo,
    Andy

Re: Naming convention for variables, subroutines, modules, etc.. and variable declaration
by jbert (Priest) on Nov 24, 2006 at 10:20 UTC
    My pet theory is that coding errors and complexity are generally proportional to "total variable scope" of the code in question.

    Most of the good advice on coding (avoid global vars, use short methods, avoid objects with too many methods, etc) all acts to reduce the scope of variables (note that object member variables have scope across all methods).

    Also, adding flags and the like to control loops increases the overall "Sum of variable scopes" in the code (by adding a var).

    So...my general rule is to declare near/at the point of use. And if that's a long way from the top of the method, then the method is too big.

    Basically, the more variables you have in scope at any one piece of code, the more combinations of possible states there are, and the harder it is to reason about the code. If a variable isn't in scope, you don't have to expend any precious mental juice wondering if it is related to the bug you are chasing.

    Perhaps paradoxically, I very often introduce naming vars (e.g. my $badger = foo($some->{lookup}) for reasons of legibility (it acts to remove the need for a comment that the thing I am dealing with is a badger). (It's also driven by the fact that if something fails I have the values handily in vars which will interpolate nicely in an error string without any messing around with string concatenation). But these are often in very short scopes (a single loop or if() branch) - a few lines.

    I can't work out if this contradicts my rule or if just adding another name for something which is already there (the expression) isn't as harmful (it doesn't multiply the number of states if it never changes). Any thoughts are appreciated.

      I concur almost exactly.

      Whilst not decrying the practice of using a temporary "naming var" for clarity completely, they can often be avoided by other good naming choices.

      In your example

      foo( $badgers->{lookup} ); ## or foo( $some->{lookup_badger} ); ## or dealwithBadger( $some->{lookup} );

      depending upon the circumstances.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
        my %favorite_critter_of = ( andy => 'badger', larry => 'camel', vito => 'cat', ); put_on_cover( $favorite_critter_of{ andy } );

        Cheers.

        Fair comment. It was a fairly bad example on my part. Certainly your underlying point that good naming is always important is something I heartily agree with.

        Thinking about this more, if the expression is used more than once, I nearly always put in a naming var. I guess I then consider the code used in the expression to be repeated code, and effectively factor it out into one place and name it. On a small scale, this is pretty much the same as factoring out common code into a subroutine.

        And since I'm often error checking/logging/throwing exceptions in production code, many/most/all such expressions would get a second usage in the error handling and hence get a name.

        Thinking about my 'naming vars' versus 'variable scope' worry a bit more, I suspect that the reason this doesn't count as "more scope" to me is that I'm not using the variable as a variable. It's an additional name...it doesn't need the full privileges and rights of a variable. I guess I think of it as a read-only alias.

        I guess I could use alias (isn't this recommended in PBP?). But I'm not sure really how perlish that is and whether it raises the bar of code comprehension too high. Maybe I just need to modernise :-)

Re: Naming convention for variables, subroutines, modules, etc.. and variable declaration
by alpha (Scribe) on Nov 24, 2006 at 07:56 UTC
    Best naming convention is that which suits YOU well :). And btw IMHO, doing my ( $path, $file,  $inode, $nlinks ); is not the best choice, because when lots of vars pop up in a list like this, it looks like crap.