Beefy Boxes and Bandwidth Generously Provided by pair Networks
go ahead... be a heretic
 
PerlMonks  

coding rules

by punkish (Priest)
on Jun 09, 2005 at 04:18 UTC ( [id://464950]=perlmeditation: print w/replies, xml ) Need Help??

Update: Several rules now modified to reflect the feedback from wise monks (credited, hopefully, correctly, below the respective rule).
Larry's philosophy that different things should look different is one of the most beautiful and powerful strengths of Perl. We are immediately guided by the $, @, and % as to the nature of the variable (cr: TimToady).

However, for my purpose, it can go even further. With that in mind, I came up with the following 9 10 rules while coding a reasonably complicated project.

Coding strategies are personal by nature, but much can be learned from those who are wiser. Hence, I put forth my rules to get comments from the more learned monks than I --

# 0. Just-in-time variable declaration and initialization.
(cr: merlyn)
# 1. Variables in consistent case (all lower or all upper) # 2. Each dictionary word in the variable name separated # by underscore, unless accepted as one word by usage as # in 'username' or 'logout' # 3. Package variables in all caps, used for things like # config values use vars qw( $FOO $BAR $BAZ $FOO_BAR );
# 4. Lexicals in all lower case
my ($foo, $bar, $baz, @foo_bar);
# 4. Global lexicals in all lower case, and # prefixed with some text to mark them as such. # For example... my ($glob_foo, $glob_bar, $glob_baz, @glob_foo_bar);
# 5. Variables local to a subroutine prefixed by some 
#    consistent prefix such as 'this_' or 'local_'. This way 
#    there never will be any confusion as to whether a 
#    given variable sprang into life in the subroutine
#    of if it exists in the outside world and might possibly 
#    get clobbered locally.
sub foo {
	my ($this_bar) = @_;
	for (@$this_bar) {
		print "yay" if ($_ eq $FOO);
	}
	return $this_bar;
}
# 5. Local lexicals on all lowercase sub foo { my ($bar) = @_; for (@$bar) { print "yay" if ($_ eq $FOO); } return $bar; }
#4 and #5 above flipped (cr: Forsaken)
# 5.5. Use prefixes to scope other logically different # vars. For example, 'cgi_' to prefix vars holding # params delivered via the cgi my $cgi_foo = $cgi->param('foo'); my $cgi_bar = $cgi->param('bar'); # 6. Refs prefixed with the appropriate identifier. I wish # Perl would extend the "different things look different" # to enable easy identification of the kind of data # structure a ref refers to. my $aref_bar = foo(\@foo_bar); # 7. Rules 1-5 apply to rule 6 as well... so, an array ref # inside a sub would be $this_aref_bar, etc.
# 8. Subroutines named in camelCase with the first letter 
#    lower case
sub normalizeThisMonkeyBoy {
}
# 8. modified: I like camelCase, however, the real purpose # here is to visually distinguish a sub from a var. Choose # your method # 9. Subroutines not exported from a package prefixed with # an underscore package Foo; @EXPORT = qw( startProgram watchOut goHome ); sub startProgram { _wakeUp(); } sub watchOut { _keepEyeOpen(); } sub goHome { _nightyNight(); } sub _wakeUp {} sub _keepEyeOpen {} sub _nightyNight {} # 9.5. Never export anything by default. @EXPORT_OK
(cr: TilRMan)
# 10. Always pass named vars to subs doThis('with' => $that, 'and' = $something_else);
--

when small people start casting long shadows, it is time to go to bed

Janitored by holli - moved to Meditations

Replies are listed 'Best First'.
Re: coding rules
by merlyn (Sage) on Jun 09, 2005 at 13:29 UTC
    I'd add:
    Introduce and initialize each variable in the smallest scope possible.
    In other words, I find code that starts out with dozens of "my" declarations to generally spell trouble for maintenance or debugging. Instead, variables should be introduced right where they are needed, initialized with the correct value for that step of the coding. If necessary, refactor the code into subroutines so that the lifetime and visibility of the variables can be reduced even further.

    Take a look at my columns (especially the later ones) for examples of "just-in-time declaration and initialization".

    Programming with globals is so 80's. {grin}

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      I respectfully disagree: if you introduce variables "when they're needed", then it's harder to find out all the variables that are in use in a given function, and what they do. In addition, you need to worry about scoping mistakes, variable masking, and other subtleties that just don't come up when everything is within a single scope.

      The full list of variables used in a function is a decent metric for code complexity; the more state information the function needs, the more complicated it is. When you have too many variables in a given function, you should probably refactor it: it's easier to do this when you realize up front how much state your function has embeded in it.

      If you document the purpose of all your variables up front as well, then the coder has some idea of how much information he'll have to juggle to understand the function. If you keep surprising him with a new working set of data, it's more confusing. And, then you run into issues with variables being accidentally declared in the wrong scope, or masking the wrong variable, or some stupid side effect using my and assignment at the same time. Why risk that kind of annoyance when you don't need to?

      What's more, having everything commented up front has a slight psychological benefit: it reminds to the maintainer that when he adds a new variable, he should comment it, just like all the others. When the declarations are scattered all over the place, that visual reminder is lost.

      It's also harder to scan for unused variables: you have to run through the function, find all the declarations, figure out all the scopes, figure out all the shadowing, and decide if the variable is still in use, and if so, are there any scopes in which the variable is not in use.

      Contrast this with a simpler code layout: where all variables are at the start of the code, and only used within a single scope (the function). Run through the list of variables at the start: grep through the rest of the lines of the function for the variable's name. If it's not in use, just delete the variable.

      Perhaps it's just because I've just been burned by several thousands of lines of bad code written in this style (multi-thousand line loops, inconsistant indentation, unused variables, multiply shadowed variables, scoping errors in production code, etc.), but I sometimes wonder why anyone really likes just in time declarations. What's the appeal of witholding information? If a section of code is complex enough to introduce a variable into a new scope, it's usually complex enough to deserve it's own name and it's own function. Why not just always put it there in the first place? Better to have a bunch of small, overly simple functions that you can prove correct than one overly-complex function that you can't, or so I've always felt.

      I normally don't argue with saints on PerlMonks, especially Randal, but I'm curious about the justification for this one. To me, as a casting director, I'd rather see the cast of characters "up front" from reading the programme, rather than wading through the play every time a new character (variable) comes up. Could you please clarify in more detail what the strengths of your method are, and how you avoid the drawbacks I've outlined? I'm genuinely curious as to whether I'm missing something...

      --
      Ytrew Q. Uiop

        Howdy!

        If you have long functions and long blocks, lots of other things go askew. Refactoring aggressively into shorter functions/methods/subroutines/whatever so that you can see the entire scope at once, just-in-time declarations are unremarkable. If you can't take the entire scope in at a glance, you do have to work harder to keep track of the matter.

        Why do I like just-in-time declaration?

        Consider

        foreach my $foo (stuff) { do stuff with $foo }
        Under most circumstances, $foo is meaningless outside the loop, so it makes sense to limit its scope to just the loop. Similarly, a variable used only within a block is best declared within that block so that it doesn't leak (for whatever useful sense of "leak" applies).

        The real appeal is an application of the concept of "least privilege" (usually invoked in the context of security) to how large the scope of a variable needs to be. Less is more.

        I was taking a stick to some code I inherited. The programmer was learning Perl on the fly and wrote a lot of C code in perl, without using strict or warnings. He did use "my" as he went along. I made it strict and warnings clean, but it was quaint, especially dealing with 1300 line routines. I applied "my" liberally. From time to time, I would get the complaint that a given variable was already lexical, but it did eventually yield the field to me...

        I try to write short routines with variables' scopes as limited as I can make them.

        yours,
        Michael
Re: coding rules
by QM (Parson) on Jun 09, 2005 at 04:27 UTC
    I applaud you for coming up with your own rules. Of course, the reason you posted them was so we could chime in, right?

    I could pretty much go along with your list, except for camelCase. I hate camelCase. There's no reason subroutine names can't follow similar rules as the variable names, since generally variables have their sigils and subs have theirs (including none).

    But to each his own :)

    -QM
    --
    Quantum Mechanics: The dreams stuff is made of

      Of course, the reason you posted them was so we could chime in, right?
      indeed. I want to get feedback and learn from others. So chime away.

      Re. using camelCase for subs, it is for the very reason stated above -- "different things should look different." That way subRoutines() look different $from_variables.

      --

      when small people start casting long shadows, it is time to go to bed
        Howdy!

        camelCase is tolerable when there are only two words. threeWordCamelcase is pushing it. havingReallyLongAndDescriptiveSubNamesThatRunHalfwayAcrossThePage is absolutely hideous, and the basis for my deep hatred for that part of the Sun Java Coding Standards *spit*.

        This was driven home recently with great force when I had cause to pore over some Java source that had method names that were three inches long with many words mashed together. It was painful to try to parse the words apart to make sense of it.

        It is, generally, far better to apply the "separate dictionary words with _" constraint to all symbol names. Most human written languages (and all that use the roman alphabet, I think) rely on white space to mark the spaces between words. Mashing the words together into a long word may be very German in its application, but it destroys the normal visual markers we rely on to parse the phrase, making it much harder to read. Using underscores connects the words with a non-whitespace character, but has a visual impact of nearly zero. The contour of the tops of the characters still has zero-height area, just as it would with spaces (or close enough as to add no appreciable load to the cognitive process).

        I also take exception to rules 5 and 6.

        Prefixing like that will tend to obscure the substantial part of the variable name. The sigils, being single, non-alphabetic characters, are easy to cope with. However, consider the mental processing in reading "$this_aref_foo" to discern the name of the variable.

        You see "this" and have to remember that this simply marks it as local to some narrower scope (but which scope? hmmm...). OK. Ignore the "this". Next you come to "aref". OK, we have an array ref, but we still don't know what it is about. Finally, we come to "foo". At last! A name conveying some sort of meaning!

        Recall "Hungarian notation", by which means one prefixes the "real" name with a series of characters that encode the data type. Nasty nasty nasty. Rules 5 and 6 go down that path, whence lies much danger and peril and nasty sticky bits that go ecky ecky ptoing niiiiiwha.

        yours,
        Michael
        Re. using camelCase for subs, it is for the very reason stated above -- "different things should look different." That way subRoutines() look different $from_variables.
        But they already look different! Tell me what each of these are:
        foobar() $foobar @foobar %foobar
        To distinguish them more, I would probably have the subs named with some kind of action, like process_rows(\@rows) (as opposed to rows(\@rows), which is begging for trouble).

        But then I really hate camelCase. I'd prefer camel_case, Camel_Case, and CamelCase before camelCase.

        -QM
        --
        Quantum Mechanics: The dreams stuff is made of

Re: coding rules
by monkey_boy (Priest) on Jun 09, 2005 at 10:38 UTC
    sub normalizeThisMonkeyBoy { }

    what did i ever do to you? ;-)


    This is not a Signature...
Re: coding rules
by ikegami (Patriarch) on Jun 09, 2005 at 05:30 UTC

    The first issue I'd like to raise is with item #3, "Package variables in all caps". All-caps usually denotes constants. While package variables are usually constants, they are not always so. To use all-caps for something that isn't a constant will confuse your code's readers. I have yet to find a method of identifying non-constant globals with which I'm happy.

    By the way, do you mean "global variables" when you say "package variables"? It's possible to have global variables that are lexicals rather than package variables.

    I also have a commment about "aref_". I find the sigil+pluralization already disambiguates references from non-references. For example, consider $pearl vs $pearls vs @pearls. The first is an object, the second is an array ref, and the third is an array. True, they look rather similar. But then again, the last two are quite similar, and anyone who reads English can immediately recognize the difference between the first two without even thinking. Granted, there are time where I may wish to disambiguate a variable's content, but I find it a step backwards to require prefixes such as "aref_".

    Finally, I also agree with QM's comment and Forsaken's comment to the last word.

      While package variables are usually constants, they are not always so. To use all-caps for something that isn't a constant will confuse your code's readers. I have yet to find a method of identifying non-constant globals with which I'm happy.

      Capitalizing globals that may be changed, like $Stuff, is something I find nice.

      ihb

      See perltoc if you don't know which perldoc to read!

Re: coding rules
by Forsaken (Friar) on Jun 09, 2005 at 05:12 UTC
    Doesn't rule #3 make rule #5 redundant? If package variables can already be identified by their being in all caps, pretty much anything else you run across is therefore by definition local to that subroutine, isn't it? Besides, I can't help but feel that after a while all the this_ and local_ are going to drive you absolutely nuts. Variables being local to subroutines should be the rule, not the exception, so in my opinion perhaps it would be better if the variables that aren't local get their own little prefix :-)


    Remember rule one...
Re: coding rules
by tlm (Prior) on Jun 09, 2005 at 13:43 UTC

    I like (and in fact follow) your rules for the most part. Like others, I detest camelCase, so I never use it when defining my own identifiers.

    I like the rule that says that the larger the scope of a variable, the more descriptive its name should be. So, I may use $Input_Filename for a file-scoped lexical, but a mere $f in something like:

    for my $f ( @filenames ) { open my $in, $f or die $!; print uc while <$in>; close $in; }
    Hence I'm not crazy about the idea of prepending "this_" or "local_" to function-scoped lexicals. Since these happen to be, by far, the more numerous of my variables, I prefer to distinguish the few remaining variables in my code.

    There are three kinds of "broad scoped" variables that I try to distinguish typographically: constants (which, actually, are not variables at all, neither semantically nor implementation-wise), file-scoped lexicals, and package (aka global) variables. For constants I use all-caps; for file-scoped lexicals I capitalize the first letter of each underscore-separated word; for package variables, I use the fully qualified name, in lower case. Hence:

    use constant DEBUG => 0; my $File_Scoped_Lexical = 1; $main::globals_suck_bigtime = 2;
    Yes, it is a pain to fully qualify package variables, but that's the point: their cumbersome nomenclature indicates that they should be used as little as possible.

    the lowliest monk

      Yes, it is a pain to fully qualify package variables, but that's the point: their cumbersome nomenclature indicates that they should be used as little as possible.

      ... and once you change the package name you'll almost definately create a bug. This might not apply to you as you're mostly a script author, but as a module author this just isn't a good advice.

      Why should package variables be more cumbersome and less used that file-scoped lexicals?

      Btw, why do you feel a need to typographically distinguish file-scoped lexicals from package variables? During development I sometimes go from file-scoped lexical to package variable and back to file-scoped lexical again, and in my module I often don't care what nature the variable is since I inside the file usually just have one package. (If I have two packages I usually put them in different lexical scopes, and put any shared variable at file scope. Keeping track of those very few variables isn't hard, especially since they're the first thing you see when you open the file.)

      I'm interested in what made you choose this style with regards to package variables.

      ihb

      See perltoc if you don't know which perldoc to read!

        I'm interested in what made you choose this style with regards to package variables.

        The same principle that motivates the entire thread: to make different things look different. Full qualification seems to me like the natural way to make package variables stand out as such.

        Package variables are the worst, because their scope transcends a single file.

        I'm less worried about package name changes (my editor can do search and replace fine) than about typos (e.g. assigning to $Typo::foo). If I were to drop this coding practice, this would be the reason. But, anyway, I already refer to all package variables, even those from modules I did not write, using fully qualified names, because I think it makes the code clearer. So this particular practice of mine is an extension of a more general one.

        the lowliest monk

combined comment on comments
by punkish (Priest) on Jun 09, 2005 at 15:34 UTC
    So much constructive feedback, and thankfully I don't agree with all of it ;-).

    Here is a combined response --

    meritsOfCamelCase: Coding is personal in nature. One coder's wonderfulSolution is another coder's stuff_of_nightmares. I happen to like camelCase from the time that I first started learning how to program using Hypertalk. The real point though is not casing, but distinguishing vars from subs. QM mentioned in 464953 that &subs already have their own sigils so there is no confusion. Additionally, suffixing sub calls with parentheses() is also good practice and provides visual feedback. I find, however, that in a large-ish mess of code, camelCase vs. all_lower_case helps even more, sigils notwithstanding.

    while (1) { my $msg_cnt = $imap->message_count($FOLDER_INBOX); if ($msg_cnt) {. my $dbh = connectToDb(); $imap->select($FOLDER_INBOX); for my $msg_num (1..$msg_cnt) { my $this_msg; eval { $this_msg = $imap->body_string($msg_num) or die; }; my $dt = getCurrentDateTime(); if ($@) { my $log = "$dt: Error: $@\n"; print "$log" if ($DEBUGGING or $CHATTY); print LOG "$log" if (not $DEBUGGING); } else { if (msgIsEndOfDayAudit('msg' => $this_msg)) { forwardMsg(..); eval { moveMsgTo('msgnum' => $msg_num, 'folder' => $FOLDER_SEEN); }; } else { my $info = extractRecFromMsg('msg' => $this_msg); .. more code .. } goToSleep($SLEEPTIME); }

    That said, modify my rule #8 to "optional: subroutines named in some way to distinguish them visually from vars."

    Forsaken's advice is simple, and actually makes a lot of sense to me -- since most of the vars are likely to be local to a sub, why not just name them normally without any of my this_ or local_ nonsense, but instead, the global, lexical vars might be better named with some identifying prefix. That way, within a sub I will know immediately what is a "global lexical" vs. "local to the sub."

    So, flip rules #4 and #5 around.

    However, #3 doesn't make #5 redundant. Package vars are in ALL_CAPS, but everything else can still be lexical as well as local... hence the need for distinguishing the package vars, the (package) lexicals, and the sub locals from each other.

    ikegami says that sigil+pluralization already disambiguates references from non-references. Fair enough, but different types of references don't get disambiguated. Hence, looking at $pearls, it is not clear if it is a ref to an array or a sub or a hash. Hence, the need for 'aref_', 'href_', or 'sref_' (or whatever). Additionally, there could be legitimately pluralized scalars which could lead to further confusion (for example, if I wanted to hold the number of pearls in $pearls). I must add, as an additional rule, that I pluralize the arrays and hashes, and singularize the elements. Hence, @pearls versus $pearl = $pearls[3].

    I would add merlyn's advice to my list of rules -- "jit declaration and init" as much as possible.

    A question to the architects of the language (TimToady, if you read this). When, as merlyn says, (and I agree), Programming with globals is so 80's, why does Perl not have all vars as local in scope unless declared global explicitly. Is it because Perl was invented in the 80's {grin}? I hate to bring up the example of PHP (a language that I don't particularly care for), but it makes everything in a sub local to it unless an outside var is brought in as a global explicitly. Perhaps, the dot notation of Perl6 might solve most of the above-mentioned, scoping-related problems.

    Janitorial note: I posted this on SoPW because I was seeking perl wisdom, not really meditating. I hope newcomers to Perlmonks and, to Perl itself, visit the meditations enough to read all the good stuff there (I should do more, but I end up spending most of my time on SoPW).

    Many thanks everyone.

    update: The way I look at it, I almost wish Perl would allow me to make my own sigils... in effect, that is what I trying to accomplish with all my prefixing nonsense above. Being able to make my own sigils in addition to the company-provided $, @, and % would be very cool.

    update2: rule 10. Always pass named variables to subs (as in the code example above).

    --

    when small people start casting long shadows, it is time to go to bed
      Coding is personal in nature. One coder's wonderfulSolution is another coder's stuff_of_nightmares.
      Of course this is true. It's a matter of taste - like many things in life (blond or brunette, pepsi or cola ...). Problems come when there are several people working on same project.

      You can like whatever you want - but in such a case, everybody needs to play by the rules.
Re: coding rules
by TilRMan (Friar) on Jun 10, 2005 at 04:07 UTC

    The first eight "rules" are style, and so naturally I like or dislike them to varying degrees. But underneath number nine lurks a more important broken rule:

    @EXPORT = qw(

    #9a. Don't export anything by default. In other words, use @EXPORT_OK instead of @EXPORT.

Re: coding rules
by spurperl (Priest) on Jun 10, 2005 at 08:10 UTC
    # 5. Variables local to a subroutine prefixed by some 
    #    consistent prefix such as 'this_' or 'local_'. This way 
    #    there never will be any confusion as to whether a 
    #    given variable sprang into life in the subroutine
    #    of if it exists in the outside world and might possibly 
    #    get clobbered locally.
    

    For this reason, the outside world variables are usually marked. Besides, global variables are bad design, object variables are prefixed with $self in Perl, and constants are usually ALL_CAPS.

    IMHO the variables inside subs should not be prefixed by anything - it's the globals that should.

Re: coding rules
by arc_of_descent (Hermit) on Jun 10, 2005 at 07:55 UTC

    I don't personally agree with rule #5 - Prefixing local variables.

    One way to not do this is to keep your subroutines short enough so that they require at most only around 4 to 5 variables. If a subroutine requires more variables, then maybe you need to break it down.

Re: coding rules
by mstone (Deacon) on Jun 13, 2005 at 04:21 UTC

    'Nother one to consider: Make the readability of names proportional to their scope.

    You don't need long names for short-lived carrier variables:

    my $d = $object->{'some'}->{'deeply'}->{'nested'}->{'item'}; _foo ($d->{'bar'}, $d->{'baz'}); $object->validate_range_params ($d->{'bar'}, $d->{'quux'}); { and $d will never be used again after this point }

    but it does make sense to use long, descriptive names for things that will be used far away from where they're defined.

    The same applies to functions. It's okay to short names for internal functions (the ones you'd start with an underscore), because you won't see them being used unless you're close enough that you can find the definition and read the associated comments. For exported functions, even though people do hate typing the long names, it's worthwhile to add a few characters that will save people from having to RTFS in order to learn what the thing does.

    Used consistently, the readability of a name can cue you in to how far you'll have to look to find the original definition.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://464950]
Approved by holli
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (7)
As of 2024-03-28 08:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found