Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: coding rules

by merlyn (Sage)
on Jun 09, 2005 at 13:29 UTC ( [id://465117]=note: print w/replies, xml ) Need Help??


in reply to coding rules

I'd add:
Introduce and initialize each variable in the smallest scope possible.
In other words, I find code that starts out with dozens of "my" declarations to generally spell trouble for maintenance or debugging. Instead, variables should be introduced right where they are needed, initialized with the correct value for that step of the coding. If necessary, refactor the code into subroutines so that the lifetime and visibility of the variables can be reduced even further.

Take a look at my columns (especially the later ones) for examples of "just-in-time declaration and initialization".

Programming with globals is so 80's. {grin}

-- Randal L. Schwartz, Perl hacker
Be sure to read my standard disclaimer if this is a reply.

Replies are listed 'Best First'.
Re^2: coding rules
by Anonymous Monk on Jun 10, 2005 at 16:57 UTC
    I respectfully disagree: if you introduce variables "when they're needed", then it's harder to find out all the variables that are in use in a given function, and what they do. In addition, you need to worry about scoping mistakes, variable masking, and other subtleties that just don't come up when everything is within a single scope.

    The full list of variables used in a function is a decent metric for code complexity; the more state information the function needs, the more complicated it is. When you have too many variables in a given function, you should probably refactor it: it's easier to do this when you realize up front how much state your function has embeded in it.

    If you document the purpose of all your variables up front as well, then the coder has some idea of how much information he'll have to juggle to understand the function. If you keep surprising him with a new working set of data, it's more confusing. And, then you run into issues with variables being accidentally declared in the wrong scope, or masking the wrong variable, or some stupid side effect using my and assignment at the same time. Why risk that kind of annoyance when you don't need to?

    What's more, having everything commented up front has a slight psychological benefit: it reminds to the maintainer that when he adds a new variable, he should comment it, just like all the others. When the declarations are scattered all over the place, that visual reminder is lost.

    It's also harder to scan for unused variables: you have to run through the function, find all the declarations, figure out all the scopes, figure out all the shadowing, and decide if the variable is still in use, and if so, are there any scopes in which the variable is not in use.

    Contrast this with a simpler code layout: where all variables are at the start of the code, and only used within a single scope (the function). Run through the list of variables at the start: grep through the rest of the lines of the function for the variable's name. If it's not in use, just delete the variable.

    Perhaps it's just because I've just been burned by several thousands of lines of bad code written in this style (multi-thousand line loops, inconsistant indentation, unused variables, multiply shadowed variables, scoping errors in production code, etc.), but I sometimes wonder why anyone really likes just in time declarations. What's the appeal of witholding information? If a section of code is complex enough to introduce a variable into a new scope, it's usually complex enough to deserve it's own name and it's own function. Why not just always put it there in the first place? Better to have a bunch of small, overly simple functions that you can prove correct than one overly-complex function that you can't, or so I've always felt.

    I normally don't argue with saints on PerlMonks, especially Randal, but I'm curious about the justification for this one. To me, as a casting director, I'd rather see the cast of characters "up front" from reading the programme, rather than wading through the play every time a new character (variable) comes up. Could you please clarify in more detail what the strengths of your method are, and how you avoid the drawbacks I've outlined? I'm genuinely curious as to whether I'm missing something...

    --
    Ytrew Q. Uiop

      Howdy!

      If you have long functions and long blocks, lots of other things go askew. Refactoring aggressively into shorter functions/methods/subroutines/whatever so that you can see the entire scope at once, just-in-time declarations are unremarkable. If you can't take the entire scope in at a glance, you do have to work harder to keep track of the matter.

      Why do I like just-in-time declaration?

      Consider

      foreach my $foo (stuff) { do stuff with $foo }
      Under most circumstances, $foo is meaningless outside the loop, so it makes sense to limit its scope to just the loop. Similarly, a variable used only within a block is best declared within that block so that it doesn't leak (for whatever useful sense of "leak" applies).

      The real appeal is an application of the concept of "least privilege" (usually invoked in the context of security) to how large the scope of a variable needs to be. Less is more.

      I was taking a stick to some code I inherited. The programmer was learning Perl on the fly and wrote a lot of C code in perl, without using strict or warnings. He did use "my" as he went along. I made it strict and warnings clean, but it was quaint, especially dealing with 1300 line routines. I applied "my" liberally. From time to time, I would get the complaint that a given variable was already lexical, but it did eventually yield the field to me...

      I try to write short routines with variables' scopes as limited as I can make them.

      yours,
      Michael
        If you have long functions and long blocks, lots of other things go askew. Refactoring aggressively into shorter functions/methods/subroutines/whatever so that you can see the entire scope at once, just-in-time declarations are unremarkable. If you can't take the entire scope in at a glance, you do have to work harder to keep track of the matter.

        That's quite true. During my refactoring efforts, I found that when I knew how many variables were in a function, I knew roughly how complex the function would be. A function with 1,000 lines, but only 3 scalars values is usually simpler to understand than a 50 line function operating on a maze of 30 different interdependant data structures.

        Under most circumstances, $foo is meaningless outside the loop, so it makes sense to limit its scope to just the loop. Similarly, a variable used only within a block is best declared within that block so that it doesn't leak (for whatever useful sense of "leak" applies).

        Hmm... I guess I'm not sure in what context you'ld want to use a unnamed block for a section of meaningful code.

        If a section of code does something complex enough to wrap in an unnamed block, then, to me, that block needs documenting. Since all function get names, documentaiton about purpose, usage, side effects, and so forth, then, to me,it seems like the natural place to put such a code block. I'm not sure how I'm invent a nice documentation standard to apply to such unnamed code blocks; nor really, why I'd want one. Functions are essentially just blocks of code with names and special scoping rules. :-)

        The real appeal is an application of the concept of "least privilege" (usually invoked in the context of security) to how large the scope of a variable needs to be. Less is more.

        I guess I tend to apply the "less is more" philosophy to code correctness first, and variable scoping second. I've spent way too many wasted hours trying to fix a bugs that only existed because someone got tricky with scope, and shadowed something they didn't intend, or wrapped code in a conditional or a loop, and thereby unknowingly negated the entire point of the code they'ld so painstakingly written. If they hadn't tried to be so clever, I wouldn't have been cleaning up so many of their mistakes. Like you said, less is more. But maybe that's just the bitterness talking. ;-)

        I try to write short routines with variables' scopes as limited as I can make them.

        I try to write short routines with variables scoped at file level: that way I know there can't be any scoping or shadowing errors. If the code is short, then I'm not risking exposing too much: and if the code can be broken out into scoping blocks, I hide it behind a function call interface and keep it even more removed from the calling function. Are there any hidden drawbacks to this approach that I'm missing, though, save perhaps ideological ones? I'm less concerned with conceptual "least priviledge", and more with ensuring that variables are used correctly in practice. If the functions are always short, this should be easy in any case, right?

        Or am I still missing something signficant?

        Ytrew

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://465117]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2024-03-28 17:45 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found