Last week, hakkr posted some coding guidelines which I found to
be too restrictive, and not addressing enough aspects.
Therefore, I've made some guidelines as well. These are my
personal guidelines, I'm not enforcing them on anyone else.
- Warnings SHOULD be turned on.
Turning on warnings helps you finding problems in your code. But it's
only useful if you understand the messages generated. You should also
know when to disable warnings - they are warnings after all, pointing
out potential problems, but not always bugs.
- Larger programs SHOULD use strictness.
The three forms of strictness can help you to prevent making certain
mistakes by restricting what you can do. But you should know when it
is appropriate to turn off a particular strictness, and regain your
freedom.
- The return values of system calls SHOULD be checked.
NFS servers will be down, permissions will change, file will
disappear, disk will fill up, resources will be used up.
System calls can fail for a number of reasons, and failure is
not uncommon. Programs should never assume a system call will
succeed - they should check for success and deal with failures.
The rare case where you don't care whether the call succeeded
should have a comment saying so.
All system calls should be checked, including, but not limited
to, close, seek, flock,
fork and exec.
- Programs running on behalf of someone else MUST use tainting;
Untaining SHOULD be done by checking for allowed formats.
Daemons listening to sockets (including, but not limited to CGI programs)
and suid and sgid programs are potential security holes. Tainting can
help securing your programs by tainting data coming from untrusted
sources. But it's only useful if you untaint carefully: check for
accepted formats.
- Programs MUST deal with signals appropriately.
Signals can be sent to the program. There are default actions - but
they are not always appropriate. If not, signal handlers need to be
installed. Care should be taken since not everything is reentrant.
Both pre-5.8.0 and post-5.8.0 have their own issues.
- Programs MUST deal with early termination appropriately.
END blocks and __DIE__ handlers should be used
if the program needs to clean up after itself, even if the program terminates
unexpectedly - for instance due to a signal, an explicite die
or a fatal error.
- Programs MUST have an exit value of 0 when running succesfully,
and a non-0 exit value when there's a failure.
Why break a good UNIX tradition? Different failures should have different
exit values.
- Daemons SHOULD never write to STDOUT or STDERR but SHOULD use
the syslog service to log messages. They should use an
appropriate facility and appropriate priorities when logging
messages.
Daemons run with no controlling terminal, and usually its standard
output and standard error disappear. The syslog service is a standard
UNIX utility especially geared towards daemons with a logging need.
It allows the system administration to determine what is logged, and
where, without the need to modify the (running) program.
- Programs SHOULD use Getopt::Long to parse options. Programs
MUST follow the POSIX standard for option parsing.
Getopt::Long supports historical style arguments (single dash, single
letter, with bundling), POSIX style, and GNU extensions.
Programs should accept reasonable synonymes for option names.
- Interactive programs MUST print a usage message when called
with wrong, incorrect or incomplete options or arguments.
Users should know how to call the program.
- Programs SHOULD support the --help and
--version options.
--help should print a usage message and exit,
while--version should the version number of the program.
- Code SHOULD have an exhaustive regression test suite.
Regression tests help catch breakage of code. The regression tests
should 'touch' all the code - that is, every piece of code should be
executed when running the regression suite. All border should be checked.
More tests is usually better than less test. Behaviour on invalid inputs
needs to be tested as well.
- Code SHOULD be in source control.
And a code source control tool will take care of keeping track of a history
or changes log, version numbers and who made the most recent change(s).
- All database modifying statements MUST be wrapped inside a
transaction.
Your data is likely to be more important than the runtime or codesize
of your program. Data integrety should be retained at all costs.
- Subroutines in standalone modules SHOULD perform argument checking
and MUST NOT assume valid arguments are passed.
Perl doesn't compile check the types of or even the number of arguments.
You will have to do that yourself.
- Objects SHOULD NOT use data inheritance unless it is appropriate.
This means that "normal" objects, where the attributes are stored inside
anonymous hashes or arrays should not be used. Non-OO programs benefit
from namespaces and strictness, why shouldn't objects? Use objects based
on keying scalars, like fly-weight objects, or inside-out objects.
You wouldn't use public attributes in Java all over the place either,
would you?
- Comment SHOULD be brief and to the point.
If you need lots of comments to explain your code, you may consider
rewriting it. Subroutines that have a whole blob of comments describing
arguments are return values are suspect. But do document invariants, pre-
and postconditions, (mathematical) relationships, theorems, observations
and other relevant things the code assumes. Variables with a broad scope
might warrant comments too.
- POD SHOULD not be interleaved with the code, and is not an alternative
for comments.
Comments and POD have two different purposes. Comments are there for
the programmer. The person who has to maintain the code. POD is there
to create user documentation from. For the person using the code.
POD should not be interleaved with the code because this makes it harder
to find the code.
- Comments, POD and variable names MUST use English.
English is the current Lingua Franca.
- Variables SHOULD have an as limited scope as is appropriate.
"No global variables", but better. Just disallowing global variables
means you can still have a loop variant with a file-wide scope. Limiting
the scope of variables means that loop variants are only known in the
body of the loop, temporary variables only in the current block, etc.
But sometimes it's useful for a variable to be global, or have a
file-wide scope.
- Variables with a small scope SHOULD have short names, variables
with a broad scope SHOULD have descriptive names.
$array_index_counter is silly;
for (my $i = 0; $i < @array; $i ++) { .. } is perfect.
But a variable that's used all over the place needs a descriptive
name.
- Constants (or variables intended to be constant) SHOULD have names
in all capitals, (with underscores separating words), so SHOULD
IO handles. Package and class names SHOULD use title case, while other
variables (including subroutines) SHOULD use lower case, words separated
by underscores.
This seems to be quite common in the Perl world.
- Custom delimiters SHOULD be tall and skinny.
/, !, | and the four sets of
braces are acceptable, #, @ and *
are not. Thick delimiters take too much attention. An exception is made
for: q $Revision: 1.1.1.1$, because RCS and CVS scan for
the dollars.
- Operators SHOULD be separated from their operands by whitespace,
with a few exceptions.
Whitespace increases readability. The exceptions are:
- Unary +, -, \,
~ and !.
- No whitespace between a comma and its left operand.
Note that there is whitespace between ++ and --
and their operands, and between -> and its operands.
- There SHOULD be whitespace between an indentifier and its indices.
There SHOULD be whitespace between successive indices.
Taking an index is an operation as well, so there should be whitespace.
Obviously, we cannot apply this rule in interpolative contexts.
- There SHOULD be whitespace between a subroutine name and its parameters,
even if the parameters are surrounded by parens.
Again, readability.
- There SHOULD NOT be whitespace after an opening parenthesis, or
before a closing parenthesis. There SHOULD NOT be whitespace after
an opening indexing bracket or brace, or before a closing indexing
bracket or brace.
That is: $array [$key], $hash {$key} and
sub ($arg).
- The opening brace of a block SHOULD be on the same line as the keyword
and the closing brace SHOULD align with the keyword, but short blocks
are allowed to be on one line.
This is K&R style bracing, except that we require it for subroutines as well.
We do allow map {$_ * $_} @args to be on one line though.
- No cuddled elses or elsifs. But the while of a
do { } while construct should be on the same line
as the closing brace.
It just looks better that way! ;-)
- Indents SHOULD be 4 spaces wide. Indents MUST NOT contain tabs.
4 spaces seems to be an often used compromise between the need to make
indents stand out, and not getting cornered. Tabs are evil.
- Lines MUST NOT exceed 80 characters.
There is just no excuse for that. More than 80 characters means it
will wrap in too many situations, leading to hard to read code.
- Align code vertically.
This makes code look more pleasing, and it brings attention to the
fact similar things are happening on close by lines. Example:
my $var = 18;
my $long_var = "Some text";
This is just a first draft. I've probably forgotten some rules.
Abigail
Re: My coding guidelines
by mojotoad (Monsignor) on Nov 25, 2002 at 18:12 UTC
|
Excellent list. It's not often you see style guidelines along with some informative prose as to the why of the guidelines.
I was surprised by this part (24):
Note that there is whitespace between ++ and -- and their operands, and between -> and its operands.
What's the consensus on this one? I've always liked snug ++/-- and dereferencing:
++$wolverines;
$rabbits--;
$wombat->move();
$turtle = $menagerie->[4];
As opposed to:
++ $wolverines;
$rabbits --;
$wombat -> move();
$turtle = $menagerie -> [4];
For that matter, I've always made unary operators snug so that you can immediately find it's operand. This might be a holdout from C, but I do actually find the snug form more aesthetically pleasing.
On a somewhat visually related note, the arrow form of the left-associative operator works well with whitespace, however:
%myhash = ( lizard => 'gecko' );
Whereas the equivalent comma operator is snug with the left operand.
Thanks again for the list,
Matt
| [reply] [d/l] [select] |
|
Don't say always too fast. Are you sure you always
snug unary operators? Including not?
Abigail
| [reply] [d/l] |
|
Since I never use 'not' in my code ... yes. ;)
I take your point, however -- there are indeed exceptions. I think "english language" unary operators might be a whole class of exceptions since they parse (to us) as English rather than code.
By the way, why is '->' a binary operator? From perldoc:
"->" is an infix dereference operator, just as it is in C and C++. If the right side is either a ..., {...}, or a (...) subscript, then the left side must be either a hard or symbolic reference to an array, a hash, or a subroutine respectively. (Or technically speaking, a location capable of holding a hard reference, if it's an array or hash reference being used for assignment.) See perlreftut and perlref.
Otherwise, the right side is a method name or a simple scalar variable containing either the method name or a subroutine reference, and the left side must be either an object (a blessed reference) or a class name (that is, a package name). See perlobj.
Seems like dereferencing is dereferencing is dereferencing, no RHS required. Is there an association going on as well?
Matt
| [reply] |
|
|
|
Re: My coding guidelines
by fruiture (Curate) on Nov 25, 2002 at 18:51 UTC
|
I think it's no great use making guidelines in general. You'll have to agree on guidelines in one project and it's good to have concrete ideas. The actual decisions will depend on the group.
Some of the things you mention are in my eyes means of sane programming (system calls, exit code). I don't think you can work seriously with someone who has to be told these things. Also, I'd never say "SHOULD". Say "MUST", so first of all there is no doubt: "Use strict! No exceptions. That's it." Someone who _really_ knows when to disable strictness, will do it. Someone who doesn't will struggle for a solution and ask for help and then someone else may tell him "This is the exception, you may disable strict here."
So there are some general rules i'd make (and in general i agree with you):
- use strict! Always.
- use warnings! Always.
- Doing CGI? use CGI.pm! Always.
- Parsing commandline options? use Getopt::Long and Configure() it POSIXLY_CORRECT! Always.
- check system calls! Always.
- Use comments to comment, use POD to document. Don't mix these two.
- Avoid Globals!
- Globals are all variables with a scope that is larger (in the sourcecode) than an average screen.
- Name Globals wisely. Follow perlstyle rules
- Use spaces to make things clearer
These are the rules I consider "general". Other things are matter of personal taste but should be equal inside a project:
- open the curly on same line as keyword. Indent next line.
- closing curly on single line, indented as deep as keyword of opening curly.
- curly-rule exceptions are special short blocks (map,grep,sort)
- No space between function name and opening parens.
- No space between array identifier and opening square bracket.
- No space between hash identifier and opeing curly
Now finally the Tab-issue: I really think that one should use tabs, but only for indentation. Don't use tabs for alignement.
if( $a > $b ){
T---return
T---T---cond1 ? val1 :
T---T---cond2 ? val2 :
T---T--- default
T---;
}
This way your alignement won't break in someone else's editor using a different tab-width, but logical indentation is still present. I am really convinced that this is a solution.
--
http://fruiture.de | [reply] [d/l] |
|
| [reply] |
|
I object to those "always" you are using in your list. You wouldn't be
able to create useful modules like Exporter and Memoize, if you always
had to use strict. Same with warnings. Perl generates warnings if it
things the programmer makes an error. But sometimes, Perl is wrong.
The right thing to do is turn warnings off, instead of having to go
through hoops to satisfy Perl.
As for point 3, I couldn't agree less. CGI.pm is a monster,
doing far more than just CGI (come on, producing HTML shouldn't be a
part of a module calling itself CGI). In the rare cases I do any CGI
programming, I tend to avoid it like the plague.
As for you tab issue, it's a good example why tabs should be avoided.
Your example indicates a tabwidth of 4. Any idea how that's going to
look like on a terminal with a default tabstop of 8? Sure, the left
indents will all be lined up, but can you garantee none of the lines
will exceed 80 characters? Any idea what's going to happen if one
of the tabs is replaced by 4 spaces, and someone using tabstops of 8
is going to view it? It won't align correctly anymore.
Tabs are evil. The savings in disk space and I/O might have been
significant 30 years ago, but they aren't nowadays. With spaces,
the indentation will always look the way the author intended it
to be (given fixed width fonts). With tabs, it will only look right
if all authors and viewers used the same tabstops.
Abigail
| [reply] [d/l] |
|
No, this is a misunderstanding: The rule says "you must", because it doesn't want to leave any doubts to somebody who doesn't know enough to break it with sense. I know that if i want to mess around in some symbol tables (e.g. Export or an Accessor Method Creator...) I will turn off strict 'refs' as well, and I want anybody else to do that then. But someone who isn't sure, must not break the rule.
Same with CGI.pm. There's a "must", because you should not question CGI.pm until you know better.
Tabs: This is not about disk space, but it's no use disuccing this. Tabs aren't evil. :)
To clarify: whom are you making these rules for? Do you need to tell yourself "use strict, you(=I) know when I may disable it!"? Such guidelines are usefull in two ways: a) for people who learn Perl, for them I wrote the "MUST". To say "SHOULD" tells them there are exceptions, but there are no exceptions they need to know about yet. b) For a project, to keep it all consistent and easier to make many people work on the same Code. There the "MUST" fits because an exception should be rare and known, because it is an exception. And: I wouldn't work with people who need to be told to use "strict" and when to disable it.
--
http://fruiture.de
| [reply] |
|
|
|
|
|
|
Consering strict, I use the following rule:
Always use strict. If you can't do something with strict, think about if you really need to do it without strict, and if you really need to, disable the part of strict locally and explain why this has to be done, e.g.
if ( ... ) {
# disabling strict refs temporary because of blabla.....
no strict 'refs';
*{ $AUTOLOAD } = $codeReference;
use strict 'refs';
}
++ Abigail-II
Best regards,
perl -e "s>>*F>e=>y)\*martinF)stronat)=>print,print v8.8.8.32.11.32" | [reply] [d/l] |
|
That's exactly right.
use strict always
does not mean you shouldn't no strict when appropriate :)
| [reply] |
Re: My coding guidelines
by Courage (Parson) on Nov 25, 2002 at 19:36 UTC
|
I am not agreed with item (25):
There SHOULD be whitespace between an indentifier and its indices. There SHOULD be whitespace between successive indices.
I see everywhere $array[$index] or $hash[$index] without any whitespaces and accustomed to this. IMHO Most perl core and CPAN modules do not satisfy that rule.
I see no readability problems with such a very standard situation of array or hash indicies.
I do agree on all other items though.
Courage, the Cowardly Dog | [reply] [d/l] [select] |
|
I know many people will disagree with rule 25. But they are my guidelines,
and that's the way I code. Not just in Perl, but any language that I can
remember programming in allows whitespace between an indentifier and its
indexing operation. Including Python. (Unfortunally, Perl 6 will break
decades of tradition).
The reason is that the eye needs resting points. Whitespace makes it
easier to divide a line of text into chunks and read it. As well as in
natural languages as in code. $foo{bar}[17]{baz} is one big blob, and
it's hard to divide it into its 4 chunks, specially when the subscripts
are a bit more complex. We don't have the tendency to chain words together
in English (unlike in for instance German). Why should we with code?
Abigail
| [reply] [d/l] |
|
| [reply] [d/l] [select] |
Re: My coding guidelines
by grantm (Parson) on Nov 25, 2002 at 22:49 UTC
|
10. Interactive programs MUST print a usage message when called with wrong, incorrect or incomplete options or arguments.
I find the Pod::Usage module to be excellent for this purpose and always combine it with Getopt::Long.
Number 24 seems weird to me also.
| [reply] |
|
I wonder what is so strange about rule 24. The perlstyle manual page
says: space around most operators. Which is what rule 24 is
saying, but it's listing the exceptions. The reason I put space between
++ and -- and its operands is that those
operators are "large" (two characters).
As for ->, it's a binary operator. I put whitespace
around other binary operators, so why not ->? If you
don't, you can get long blobs of text, without a resting place
for the eyes. Perl should look like English, not German. ;-).
Abigail
| [reply] [d/l] [select] |
|
| [reply] |
|
Re: My coding guidelines
by derby (Abbot) on Nov 25, 2002 at 18:53 UTC
|
Abigail-II, great list. I agree with most of them
(rule #24 is just a bit odd for me). Also, I understand rule #16
but don't understand the reasoning (maybe that means I
don't understand the rule). Any chance of talking you
into expounding on that some more?
-derby | [reply] |
|
#16 is about the fact that if you reference a non-existance hash key...nothing happens. Better to use the options detailed in the OP so that "mis"-references are trapped.
| [reply] |
Re: My coding guidelines
by cLive ;-) (Prior) on Nov 25, 2002 at 18:56 UTC
|
Re point 15:
"Perl doesn't compile check the types of or even the number of arguments"
Am I missing something, or isn't that what prototypes are for?
.02
cLive ;-) | [reply] |
|
| [reply] |
|
You are missing something, here are two examples:
# prototypes only check arg format
# i.e. (scalar/array/hash/code/glob and number of args)
package Sample;
sub stringify () {
return ($_[0]->{data}) x $_[1]
}
package main;
sub print_string($) {
print $_[0]->stringify(1)
}
my $obj = bless ( {data=>"foo\n"}, 'Sample');
print_string($obj); # fine
print_string(5); # uh oh... basic numbers don't have a stringify me
+thod,
# yet the arg passed the prototype since 5 is a sc
+alar.
And also:
# methods don't check prototypes... at all.
sub print_string_bad($) {
print $_[0]->stringify() # Will pass the prototype, yet break since $_
+[1] will be
# undef in stringify
}
print_string_bad($obj);
Hence the reason for the general distaste for prototypes among the perl community. For most cases, they're pointless. They're only there so that you can call your own subs like perl-builtins. | [reply] [d/l] [select] |
|
You can't use prototypes to have the compiler check whether
an argument is an integer, or an object of a certain type.
Prototypes are less useful than you might think at first,
and they are sometimes bloody nasty. The following doesn't
do what you want:
sub foo ($$) {....}
my @bar = (1, 2);
foo @bar;
&foo isn't called with two arguments - even
while @bar has two elements. Remove the prototype,
and it will work as expected.
Abigail | [reply] [d/l] [select] |
Re: My coding guidelines
by DamnDirtyApe (Curate) on Nov 25, 2002 at 23:41 UTC
|
I know it's been kicked around the monastery before, but I'd like to know Abigail-II's take on receiving arguments in subs. Do you tend to more towards $x = shift; $y = shift; ..., or ($x, $y) = @_;, or some combination of the two, depending on circumstances?
_______________
DamnDirtyApe
Those who know that they are profound strive for clarity. Those who
would like to seem profound to the crowd strive for obscurity.
--Friedrich Nietzsche
| [reply] [d/l] [select] |
|
It depends. Sometimes I use shift, sometimes
list assignment. And often I prefer named parameters - sometimes
in combination with shifts.
Abigail
| [reply] [d/l] |
|
DamnDirtyApe - take a peek at this node of mine for some interesting insight on receiving arguments in subs.
--
vek
--
| [reply] |
|
| [reply] [d/l] |
|
|
Re: My coding guidelines
by Anonymous Monk on Nov 25, 2002 at 19:37 UTC
|
18. POD SHOULD not be interleaved with the code, and is not an alternative for comments.
Comments and POD have two different purposes. Comments are there for the programmer. The person who has to maintain the code. POD is there to create user documentation from. For the person using the code. POD should not be interleaved with the code because this makes it harder to find the code.
There would be no need for this if Perl had multi-line comments. And please don't tell me about the Acme module. Or the infamous here-doc technique. This must change! This is so sad!
http://dev.perl.org/rfc/5.html | [reply] |
|
| [reply] |
item 25 and Perl 5 DWIMary (Re: My coding guidelines)
by John M. Dlugosz (Monsignor) on Nov 26, 2002 at 23:29 UTC
|
I read in Apocalypse 4,
But I've come to the conclusion that I'd rather screw around (a little) with the "insignificant whitespace" rule than to require an extra unnatural delimiter. If we observe current practice, we note that 99% of the time, when people write a hash subscript they do so without any whitespace before it. And 99% of the time, when they write a block, they do put some whitespace in front of it. So we'll just dwim it using the whitespace.
... Therefore, we will make the rule that a left curly that has whitespace in front of it will never be interpreted as a subscript in Perl 6. (If you think this is totally bizarre thing to do, consider that this new approach is actually consistent with how Perl 5 already parses variables within interpolated strings.)
| [reply] |
|
I'm well aware of that. It's also the main reason I'm not
interested in perl6 at all.
Abigail
| [reply] |
Re: My coding guidelines
by belg4mit (Prior) on Dec 01, 2002 at 06:14 UTC
|
| [reply] |
Re: My coding guidelines
by sauoq (Abbot) on Dec 01, 2002 at 05:05 UTC
|
Excellent. ++Abigail once for the whole thing and once each for 30, 31, and 32.
Of course, I don't follow all of those guidelines but most of the ones I don't are cosmetic.
- I don't use whitespace between indices.
- I cuddle my elses.
- I don't use whitespace around the ->, ++ and -- operators.
- I prefer either foo($bar) or foo( $bar ) to foo ($bar)
- I use Getopt::Std and eschew long options. I do provide -h and -V for usage and version banners though.
This is just a first draft. I've probably forgotten some rules.
How about your guidelines for breaking long lines?
-sauoq
"My two cents aren't worth a dime.";
| [reply] [d/l] [select] |
Re: My coding guidelines
by hardburn (Abbot) on Nov 26, 2002 at 15:09 UTC
|
POD SHOULD not be interleaved with the code, and is not an alternative for comments.
Does this include POD put before subroutines? Such as:
=item foo
foo $arg1
Does foo
=cut
sub foo
{
...
}
=item bar
bar $foo
Does bar
=cut
sub bar
{
...
}
Variables with a small scope SHOULD have short names, variables with a broad scope SHOULD have descriptive names.
Heh, there was a guy in my high school C++ class who declared all his counter variables as globals, and he used a different var for every loop. The entire bloody alphabet was at the top of every code file he wrote.
Constants (or variables intended to be constant) . . .
Is there any execuse for not doing "use constant ..."?
No cuddled elses or elsifs . . .
Cuddled?
Indents SHOULD be 4 spaces wide. Indents MUST NOT contain tabs.
Where's my flame thrower? :) | [reply] [d/l] |
|
Does this include POD put before subroutines?
Yes it does. That was exactly what I was aiming it. Whenever I have
to go through a file that alternates POD, subroutine, POD, subroutine,
I find it very hard to find my way around the code.
I also think that it's rare you want the documented function to appear
in the same order in the manual page as that you happen to have them
in the file.
Is there any execuse for not doing "use constant ..."?
Most certainly. It's easier to interpolate $CONSTANT
than CONSTANT. And $CONSTANT will not fall
victim to autoquoting (left of a fat arrow, or as a hash index), while
CONSTANT might if you aren't careful.
Cuddled?
"Cuddled elses" is a term from perlstyle. It means:
if (condition) {
+
code
} else { # This else is cuddled.
more code
}
This is K&R style. With uncuddled elses, you align the else
with the if and the closing braces. There's a lot to say
for K&R style though, it strengthens the fact that else
doesn't start a new construct, but is part of the if,
and it will save a line.
Abigail
| [reply] [d/l] [select] |
Re: My coding guidelines
by petral (Curate) on Nov 26, 2002 at 19:03 UTC
|
Does anyone else do this?
      Prefer '<'  to  '>'.
      This makes numbers read from low to high, which somehow accords with the way we think:
          0 <= $val < 100   instead of:   100 > $val >= 0
  p | [reply] [d/l] [select] |
|
|