Re: coding rules
by merlyn (Sage) on Jun 09, 2005 at 13:29 UTC
|
I'd add:
Introduce and initialize each variable in the smallest scope possible.
In other words, I find code that starts out with dozens of "my" declarations to generally spell trouble for maintenance or debugging. Instead, variables should be introduced right where they are needed, initialized with the correct value for that step of the coding. If necessary, refactor the code into subroutines so that the lifetime and visibility of the variables can be reduced even further.
Take a look at my columns (especially the later ones) for examples of "just-in-time declaration and initialization".
Programming with globals is so 80's. {grin}
| [reply] |
|
I respectfully disagree: if you introduce variables "when they're needed", then it's harder to find out all the variables that are in use in a given function, and what they do. In addition, you need to worry about scoping mistakes, variable masking, and other subtleties that just don't come up when everything is within a single scope.
The full list of variables used in a function is a decent metric for code complexity; the more state information the function needs, the more complicated it is. When you have too many variables in a given function, you should probably refactor it: it's easier to do this when you realize up front how much state your function has embeded in it.
If you document the purpose of all your variables up front as well, then the coder has some idea of how much information he'll have to juggle to understand the function. If you keep surprising him with a new working set of data, it's more confusing. And, then you run into issues with variables being accidentally declared in the wrong scope, or masking the wrong variable, or some stupid side effect using my and assignment at the same time. Why risk that kind of annoyance when you don't need to?
What's more, having everything commented up front has a slight psychological benefit: it reminds to the maintainer that when he adds a new variable, he should comment it, just like all the others. When the declarations are scattered all over the place, that visual reminder is lost.
It's also harder to scan for unused variables: you have to run through the function, find all the declarations, figure out all the scopes, figure out all the shadowing, and decide if the variable is still in use, and if so, are there any scopes in which the variable is not in use.
Contrast this with a simpler code layout: where all variables are at the start of the code, and only used within a single scope (the function). Run through the list of variables at the start: grep through the rest of the lines of the function for the variable's name. If it's not in use, just delete the variable.
Perhaps it's just because I've just been burned by several thousands of lines of bad code written in this style (multi-thousand line loops, inconsistant indentation, unused variables, multiply shadowed variables, scoping errors in production code, etc.), but I sometimes wonder why anyone really likes just in time declarations. What's the appeal of witholding information? If a section of code is complex enough to introduce a variable into a new scope, it's usually complex enough to deserve it's own name and it's own function. Why not just always put it there in the first place? Better to have a bunch of small, overly simple functions that you can prove correct than one overly-complex function that you can't, or so I've always felt.
I normally don't argue with saints on PerlMonks, especially Randal, but I'm curious about the justification for this one. To me, as a casting director, I'd rather see the cast of characters "up front" from reading the programme, rather than wading through the play every time a new character (variable) comes up. Could you please clarify in more detail what the strengths of your method are, and how you avoid the drawbacks I've outlined? I'm genuinely curious as to whether I'm missing something...
--
Ytrew Q. Uiop
| [reply] |
|
Howdy!
If you have long functions and long blocks, lots of other things go askew.
Refactoring aggressively into shorter functions/methods/subroutines/whatever
so that you can see the entire scope at once, just-in-time declarations are
unremarkable. If you can't take the entire scope in at a glance, you do
have to work harder to keep track of the matter.
Why do I like just-in-time declaration?
Consider
foreach my $foo (stuff)
{
do stuff with $foo
}
Under most circumstances, $foo is meaningless outside the loop, so it
makes sense to limit its scope to just the loop. Similarly, a variable
used only within a block is best declared within that block so that it
doesn't leak (for whatever useful sense of "leak" applies).
The real appeal is an application of the concept of "least privilege"
(usually invoked in the context of security) to how large the scope
of a variable needs to be. Less is more.
I was taking a stick to some code I inherited. The programmer was learning
Perl on the fly and wrote a lot of C code in perl, without using strict
or warnings. He did use "my" as he went along. I made it strict and
warnings clean, but it was quaint, especially dealing with 1300 line
routines. I applied "my" liberally. From time to time, I would get the
complaint that a given variable was already lexical, but it did
eventually yield the field to me...
I try to write short routines with variables' scopes as limited as I
can make them.
| [reply] [d/l] |
|
Re: coding rules
by QM (Parson) on Jun 09, 2005 at 04:27 UTC
|
I applaud you for coming up with your own rules. Of course, the reason you posted them was so we could chime in, right?
I could pretty much go along with your list, except for camelCase. I hate camelCase. There's no reason subroutine names can't follow similar rules as the variable names, since generally variables have their sigils and subs have theirs (including none).
But to each his own :)
-QM
--
Quantum Mechanics: The dreams stuff is made of
| [reply] |
|
| [reply] |
|
Howdy!
camelCase is tolerable when there are only two words.
threeWordCamelcase is pushing it. havingReallyLongAndDescriptiveSubNamesThatRunHalfwayAcrossThePage
is absolutely hideous, and the basis for my deep hatred for that
part of the Sun Java Coding Standards *spit*.
This was driven home recently with great force when I had cause
to pore over some Java source that had method names that were
three inches long with many words mashed together. It was
painful to try to parse the words apart to make sense of it.
It is, generally, far better to apply the "separate dictionary
words with _" constraint to all symbol names. Most human
written languages (and all that use the roman alphabet, I think)
rely on white space to mark the spaces between words. Mashing
the words together into a long word may be very German in its
application, but it destroys the normal visual markers we
rely on to parse the phrase, making it much harder to read.
Using underscores connects the words with a non-whitespace
character, but has a visual impact of nearly zero. The contour
of the tops of the characters still has zero-height area, just
as it would with spaces (or close enough as to add no
appreciable load to the cognitive process).
I also take exception to rules 5 and 6.
Prefixing like that will tend to obscure the substantial
part of the variable name. The sigils, being single, non-alphabetic characters, are easy to cope with. However,
consider the mental processing in reading "$this_aref_foo"
to discern the name of the variable.
You see "this" and have to remember that this simply marks
it as local to some narrower scope (but which scope? hmmm...).
OK. Ignore the "this". Next you come to "aref". OK, we have
an array ref, but we still don't know what it is about. Finally,
we come to "foo". At last! A name conveying some sort of
meaning!
Recall "Hungarian notation", by which means one prefixes the
"real" name with a series of characters that encode the data
type. Nasty nasty nasty. Rules 5 and 6 go down that path, whence
lies much danger and peril and nasty sticky bits that go ecky
ecky ptoing niiiiiwha.
| [reply] |
|
|
|
|
|
|
Re. using camelCase for subs, it is for the very reason stated above -- "different things should look different." That way subRoutines() look different $from_variables.
But they already look different! Tell me what each of these are:
foobar()
$foobar
@foobar
%foobar
To distinguish them more, I would probably have the subs named with some kind of action, like process_rows(\@rows) (as opposed to rows(\@rows), which is begging for trouble).
But then I really hate camelCase. I'd prefer camel_case, Camel_Case, and CamelCase before camelCase.
-QM
--
Quantum Mechanics: The dreams stuff is made of
| [reply] [d/l] [select] |
Re: coding rules
by monkey_boy (Priest) on Jun 09, 2005 at 10:38 UTC
|
| [reply] [d/l] |
Re: coding rules
by ikegami (Patriarch) on Jun 09, 2005 at 05:30 UTC
|
The first issue I'd like to raise is with item #3, "Package variables in all caps". All-caps usually denotes constants. While package variables are usually constants, they are not always so. To use all-caps for something that isn't a constant will confuse your code's readers. I have yet to find a method of identifying non-constant globals with which I'm happy.
By the way, do you mean "global variables" when you say "package variables"? It's possible to have global variables that are lexicals rather than package variables.
I also have a commment about "aref_". I find the sigil+pluralization already disambiguates references from non-references. For example, consider $pearl vs $pearls vs @pearls. The first is an object, the second is an array ref, and the third is an array. True, they look rather similar. But then again, the last two are quite similar, and anyone who reads English can immediately recognize the difference between the first two without even thinking. Granted, there are time where I may wish to disambiguate a variable's content, but I find it a step backwards to require prefixes such as "aref_".
Finally, I also agree with QM's comment and Forsaken's comment to the last word.
| [reply] |
|
| [reply] [d/l] |
Re: coding rules
by Forsaken (Friar) on Jun 09, 2005 at 05:12 UTC
|
Doesn't rule #3 make rule #5 redundant? If package variables can already be identified by their being in all caps, pretty much anything else you run across is therefore by definition local to that subroutine, isn't it? Besides, I can't help but feel that after a while all the this_ and local_ are going to drive you absolutely nuts. Variables being local to subroutines should be the rule, not the exception, so in my opinion perhaps it would be better if the variables that aren't local get their own little prefix :-)
| [reply] |
Re: coding rules
by tlm (Prior) on Jun 09, 2005 at 13:43 UTC
|
I like (and in fact follow) your rules for the most part. Like others, I detest camelCase, so I never use it when defining my own identifiers.
I like the rule that says that the larger the scope of a variable, the more descriptive its name should be. So, I may use $Input_Filename for a file-scoped lexical, but a mere $f in something like:
for my $f ( @filenames ) {
open my $in, $f or die $!;
print uc while <$in>;
close $in;
}
Hence I'm not crazy about the idea of prepending "this_" or "local_" to function-scoped lexicals. Since these happen to be, by far, the more numerous of my variables, I prefer to distinguish the few remaining variables in my code.
There are three kinds of "broad scoped" variables that I try to distinguish typographically: constants (which, actually, are not variables at all, neither semantically nor implementation-wise), file-scoped lexicals, and package (aka global) variables. For constants I use all-caps; for file-scoped lexicals I capitalize the first letter of each underscore-separated word; for package variables, I use the fully qualified name, in lower case. Hence:
use constant DEBUG => 0;
my $File_Scoped_Lexical = 1;
$main::globals_suck_bigtime = 2;
Yes, it is a pain to fully qualify package variables, but that's the point: their cumbersome nomenclature indicates that they should be used as little as possible.
| [reply] [d/l] [select] |
|
Yes, it is a pain to fully qualify package variables, but that's the point: their cumbersome nomenclature indicates that they should be used as little as possible.
... and once you change the package name you'll almost definately create a bug. This might not apply to you as you're mostly a script author, but as a module author this just isn't a good advice.
Why should package variables be more cumbersome and less used that file-scoped lexicals?
Btw, why do you feel a need to typographically distinguish file-scoped lexicals from package variables? During development I sometimes go from file-scoped lexical to package variable and back to file-scoped lexical again, and in my module I often don't care what nature the variable is since I inside the file usually just have one package. (If I have two packages I usually put them in different lexical scopes, and put any shared variable at file scope. Keeping track of those very few variables isn't hard, especially since they're the first thing you see when you open the file.)
I'm interested in what made you choose this style with regards to package variables.
ihb
See perltoc if you don't know which perldoc to read!
| [reply] |
|
I'm interested in what made you choose this style with regards to package variables.
The same principle that motivates the entire thread: to make different things look different. Full qualification seems to me like the natural way to make package variables stand out as such.
Package variables are the worst, because their scope transcends a single file.
I'm less worried about package name changes (my editor can do search and replace fine) than about typos (e.g. assigning to $Typo::foo). If I were to drop this coding practice, this would be the reason. But, anyway, I already refer to all package variables, even those from modules I did not write, using fully qualified names, because I think it makes the code clearer. So this particular practice of mine is an extension of a more general one.
| [reply] |
|
combined comment on comments
by punkish (Priest) on Jun 09, 2005 at 15:34 UTC
|
So much constructive feedback, and thankfully I don't agree with all of it ;-).
Here is a combined response --
meritsOfCamelCase: Coding is personal in nature. One coder's wonderfulSolution is another coder's stuff_of_nightmares. I happen to like camelCase from the time that I first started learning how to program using Hypertalk. The real point though is not casing, but distinguishing vars from subs. QM mentioned in 464953 that &subs already have their own sigils so there is no confusion. Additionally, suffixing sub calls with parentheses() is also good practice and provides visual feedback. I find, however, that in a large-ish mess of code, camelCase vs. all_lower_case helps even more, sigils notwithstanding.
while (1) {
my $msg_cnt = $imap->message_count($FOLDER_INBOX);
if ($msg_cnt) {.
my $dbh = connectToDb();
$imap->select($FOLDER_INBOX);
for my $msg_num (1..$msg_cnt) {
my $this_msg;
eval {
$this_msg = $imap->body_string($msg_num) or die;
};
my $dt = getCurrentDateTime();
if ($@) {
my $log = "$dt: Error: $@\n";
print "$log" if ($DEBUGGING or $CHATTY);
print LOG "$log" if (not $DEBUGGING);
} else {
if (msgIsEndOfDayAudit('msg' => $this_msg)) {
forwardMsg(..);
eval {
moveMsgTo('msgnum' => $msg_num, 'folder' => $FOLDER_SEEN);
};
} else {
my $info = extractRecFromMsg('msg' => $this_msg);
..
more code
..
}
goToSleep($SLEEPTIME);
}
That said, modify my rule #8 to "optional: subroutines named in some way to distinguish them visually from vars."
Forsaken's advice is simple, and actually makes a lot of sense to me -- since most of the vars are likely to be local to a sub, why not just name them normally without any of my this_ or local_ nonsense, but instead, the global, lexical vars might be better named with some identifying prefix. That way, within a sub I will know immediately what is a "global lexical" vs. "local to the sub."
So, flip rules #4 and #5 around.
However, #3 doesn't make #5 redundant. Package vars are in ALL_CAPS, but everything else can still be lexical as well as local... hence the need for distinguishing the package vars, the (package) lexicals, and the sub locals from each other.
ikegami says that sigil+pluralization already disambiguates references from non-references. Fair enough, but different types of references don't get disambiguated. Hence, looking at $pearls, it is not clear if it is a ref to an array or a sub or a hash. Hence, the need for 'aref_', 'href_', or 'sref_' (or whatever). Additionally, there could be legitimately pluralized scalars which could lead to further confusion (for example, if I wanted to hold the number of pearls in $pearls). I must add, as an additional rule, that I pluralize the arrays and hashes, and singularize the elements. Hence, @pearls versus $pearl = $pearls[3].
I would add merlyn's advice to my list of rules -- "jit declaration and init" as much as possible.
A question to the architects of the language (TimToady, if you read this). When, as merlyn says, (and I agree), Programming with globals is so 80's, why does Perl not have all vars as local in scope unless declared global explicitly. Is it because Perl was invented in the 80's {grin}? I hate to bring up the example of PHP (a language that I don't particularly care for), but it makes everything in a sub local to it unless an outside var is brought in as a global explicitly. Perhaps, the dot notation of Perl6 might solve most of the above-mentioned, scoping-related problems.
Janitorial note: I posted this on SoPW because I was seeking perl wisdom, not really meditating. I hope newcomers to Perlmonks and, to Perl itself, visit the meditations enough to read all the good stuff there (I should do more, but I end up spending most of my time on SoPW).
Many thanks everyone.
update: The way I look at it, I almost wish Perl would allow me to make my own sigils... in effect, that is what I trying to accomplish with all my prefixing nonsense above. Being able to make my own sigils in addition to the company-provided $, @, and % would be very cool.
update2: rule 10. Always pass named variables to subs (as in the code example above).
--
when small people start casting long shadows, it is time to go to bed
| [reply] [d/l] [select] |
|
Coding is personal in nature. One coder's wonderfulSolution is another coder's stuff_of_nightmares.
Of course this is true. It's a matter of taste - like many things in life (blond or brunette, pepsi or cola ...). Problems come when there are several people working on same project.
You can like whatever you want - but in such a case, everybody needs to play by the rules.
| [reply] |
Re: coding rules
by TilRMan (Friar) on Jun 10, 2005 at 04:07 UTC
|
The first eight "rules" are style, and so naturally I like or dislike them to varying degrees. But underneath number nine lurks a more important broken rule:
@EXPORT = qw(
#9a. Don't export anything by default. In other words, use @EXPORT_OK instead of @EXPORT.
| [reply] [d/l] [select] |
Re: coding rules
by spurperl (Priest) on Jun 10, 2005 at 08:10 UTC
|
# 5. Variables local to a subroutine prefixed by some
# consistent prefix such as 'this_' or 'local_'. This way
# there never will be any confusion as to whether a
# given variable sprang into life in the subroutine
# of if it exists in the outside world and might possibly
# get clobbered locally.
For this reason, the outside world variables are usually marked. Besides, global variables are bad design, object variables are prefixed with $self in Perl, and constants are usually ALL_CAPS.
IMHO the variables inside subs should not be prefixed by anything - it's the globals that should. | [reply] |
Re: coding rules
by arc_of_descent (Hermit) on Jun 10, 2005 at 07:55 UTC
|
I don't personally agree with rule #5 - Prefixing local variables.
One way to not do this is to keep your subroutines short enough so that they require at most only around 4 to 5 variables. If a subroutine requires more variables, then maybe you need to break it down.
| [reply] |
Re: coding rules
by mstone (Deacon) on Jun 13, 2005 at 04:21 UTC
|
'Nother one to consider: Make the readability of names proportional to their scope.
You don't need long names for short-lived carrier variables:
my $d = $object->{'some'}->{'deeply'}->{'nested'}->{'item'};
_foo ($d->{'bar'}, $d->{'baz'});
$object->validate_range_params ($d->{'bar'}, $d->{'quux'});
{ and $d will never be used again after this point }
but it does make sense to use long, descriptive names for things that will be used far away from where they're defined.
The same applies to functions.
It's okay to short names for internal functions (the ones you'd start with an underscore), because you won't see them being used unless you're close enough that you can find the definition and read the associated comments.
For exported functions, even though people do hate typing the long names, it's worthwhile to add a few characters that will save people from having to RTFS in order to learn what the thing does.
Used consistently, the readability of a name can cue you in to how far you'll have to look to find the original definition.
| [reply] [d/l] |