With respect to this node and perlstyle

I am composing some house rules/guidlines for our CGI programmers. Whilst some may be specific to our shop the idea is to promote good maintainable programming for a team environment. Hopefully this list will set the standards for our code reviews without cramping style to much. I'd appreciate any comments or input from anybody else who has written similar guidlines. I think some security focused rules may make good additions

1. use strict and warnings
2. use HTML::Template not HERE documents
3. no deliberate obfuscation
4. all files must be documented with POD (Author, Date created, synopsis, description)
5. OO always to be used where possible
6. Pass by reference
7. No hardcoding of values especially file paths, script names etc
8. Config files or config scripts to be used
9. Do not reinvent wheel always check CPAN
10. minimise coupling
11. maximise cohesion
12. maximise reuse
13. Always indent logic
14. Any reused code block above 5 lines must become a sub routine
15. Always retrieve database rows into hash reference not arrays
16. Use CGI CARP
17. Assign CGI parameters to hashes
18. Version history at bottom of file
19. All subroutines to be commented
20. Meaningful Subroutine and Variable names to be used
21. Constructors must only create objects
22. Program flow control to be places in 'main' method
23. No Global Variables
24. use my not local
25. All Perl filehandles should be in all uppercase
26. Always use CGI.pm
27. Use verbs for subroutines, use nouns for variables, be descriptive and consistent.
28. Do not use $_
29. Do no use goto
30. Initialise variables
31. Error handling done using Begin blocks at CGI level. All dies are logged
32. Where code returns a result it should do so or die with a meaningful error and $!
END

Comment on Perl Programming guidelines/rules
Re: Perl Programming guidlines/rules
by mirod (Canon) on Nov 21, 2002 at 11:45 UTC

    This looks like a good start.

    I would add a couple of rules about variables:

    • Variables should be commented unless clearly self-descriptive,
    • Use a consistent variable naming style (I like lower_case_underscore_separated but some odd people prefer allInOneWordWithDanglingCaps,CONSTANT and Package_Variable can also be recommended),

    And (of course!) I would argue with a couple of you rules :

    14. Any reused code block above 5 lines must become a sub routine

    5 lines can be a _LOT_ of code! The risk with that kind of rule is that some people might read it as "it's fine to cut'n paste up to 4 lines of code" or even "don't bother writing a sub that's less than 5 lines"

    28. Do not use $_

    $_ should not be abused but it has its uses and can make some blocks of code much clearer.

        It's not very Perlish. The underscore style is generally preferred.
      If you find yourself commenting your variable names then you probably didn't spend enough time thinking of a good name for it.

      Remember too that there's a difference between parameter names and variable names. In a dynamic language, a parameter name should probably suggest its type, variables should suggest their role. Kent Beck is really good on this in 'Smalltalk Best Practice Patterns'.

        Simple scalars may often have descriptive names, but I just wrote this:

        my $standard={}; # the structure that contains all the standard info # a hasref to # title => "standard title" # dir => standard directory (where the standard +page and graphics are) # des => designation (802.11a-2000) # norm_des => normalized designation (802_11_a_2000) # def => hasref { term => "html definition" }

        I found this kind of comment to be immensely useful several month later when I need to fix a bug (rare ;--) or to add features to the code (often!).

        And yes, the data represents a view of a standard, hence its name, but there are many infos that can be stored about a standard, it helps to know exactly which one this data structure is providing.

        BTW, while I agree that this structure could be an object, I don't think that the specific program I am working on requires OO. And in any case it is nice to have all the docs about the variable right where I declare it.

•Re: Perl Programming guidlines/rules
by merlyn (Sage) on Nov 21, 2002 at 11:48 UTC
    I've been researching this problem in upgrading our code review course and code review services recently.

    The biggest problem with your list is that you have very little that can be objectively tested. A checklist item is useful if everyone could agree that you are following it, but otherwise it simply becomes a way to butt heads in a code review session.

    On one of our slides, I bring up the example "you should generally 'use strict'" as a sample bad rule. It's bad because it can't be tested. Is this one of the times that you avoid use strict, or not?

    I also have problems with some of your specific rules. For example, "Do not use $_". That's ludicrous. How will you do a grep or map? Or are you proposing coding everything in babybaby Perl?

    A reasonable standard for presumption of the skill level of your maintenance programmer is:

    • Has read Learning Perl
    • Has read all the applicable tutorial series in the core docs
    • Knows how to type "perldoc Module::Name"
    Anything beyond that should be documented if used, giving a literature reference if needed. Yes, this even goes for the Schwartzian Transform, since that's not in a tutorial doc or Learning Perl, only a FAQ item.

    But the biggest point I hope you're seeing is that you should write native Perl when you are writing Perl. Don't constrain Perl to just the parts that are C-like or Shell-like. You miss the point of programming in Perl.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.

      :) I thought the $_ thing would be a bit controversial.

      Sometimes it makes code harder to read so I say never use it unless it's a perlish function like map where it is obvious what it refers to.

        I agree. I only use $_ in places like map where it is required.
        That's stupid. Sometimes regexes make code harder to read. Same comments, OO, subroutines or arrays make code harder to read. But only PHBs use such arguments to outlaw them via a coding standard.

        Abigail

Re: Perl Programming guidlines/rules
by Callum (Chaplain) on Nov 21, 2002 at 12:08 UTC
    In addition to the previous comments, a couple of thoughts:

    1. taint anything you can't be sure about.
    2. consistency, especially if multiple people are developing/maintaining the code -- eg 13. Always indent logic, ensure people use spaces or tabs *not* a mixture.

Re: Perl Programming guidlines/rules
by Abigail-II (Bishop) on Nov 21, 2002 at 13:05 UTC
    1. Use strict and use warnings

    I'd be very wary about a 32 point coding standard that puts two totally unrelated things in the first item. It's like saying "use apples and red paint". Which strict are you referring to? We have three, making 8 combinations. Furthermore, there are situations the obvious and clearest way to do something is prevented by some form of strictness. Or that it will issue a warning. Coding standards that mandate the use of strictness and/or warnings should also allow you to turn them off when appropriate.

    2. use HTML::Template not HERE documents

    Huh? Since when is HTML::Template always an alternative for HERE documents? I typically use HERE documents to write SQL statements - is HTML::Template a logical alternative for that?

    5. OO always to be used where possible

    I can understand that rule. But it doesn't make sense as a rule when Perl is the used language. Java, Python, Ruby, ok. But Perl? Where OO is slapped on to the language, and in a bad way?

    6. Pass by reference

    This is the default by Perl, and kind of hard to bypass. Or do you mean that you can't allow literals as subroutine arguments?

    9. Do not reinvent wheel always check CPAN

    But CPAN contains a lot of bad code. Often, reinventing the wheel is faster than checking the quality of a CPAN module. Besides, who are you going to blame if the CPAN module contains a fatal bug?

    13. Always indent logic

    I do not know what you mean by this.

    14. Any reused code block above 5 lines must become a sub routine

    Bad rule. Reuse is often good, but sometimes copy-and-paste is preferred. Reuse code if you expect the code always to be the same. But use copy-and-paste if you want the flexibility that the copied code later will do something else.

    15. Always retrieve database rows into hash reference not arrays

    Why?

    17. Assign CGI parameters to hashes

    Why?

    18. Version history at bottom of file

    That's all you have to say about version history? No standardization on a version control mechanism?

    21. Constructors must only create objects

    But no rules on object initializators?

    24. use my not local

    Really? I can't local punctuation variables, aggregate elements, tied variables, or file handles?

    27. Use verbs for subroutines, use nouns for variables, be descriptive and consistent.

    That's very surprising! You wanted OO, didn't you? That means you will have lots and lots of subroutines that will return something, and not "do" something. How am I supposed to name a method that returns the creation date? I'd prefer "creation_date", a noun. But what verb ought to be used?

    28. Do not use $_

    Not use $_?????? Are you talking about the same Perl language as I do?

    29. Do no use goto

    Really? Does that mean I can't use AUTOLOAD that makes an accessor on the fly and then uses magic goto to go to it? Can I use a CPAN module that uses goto?

    30. Initialise variables

    That means that autovivifying file handles are forbidden? That I cannot do my @array; push @array => 3;, but that I have to do my @array = (); push @array => 3;?

    31. Error handling done using Begin blocks at CGI level. All dies are logged

    That doesn't make any sense to me. If I know at compile that an error will happen, wouldn't it be better to fix my code? And how am I supposed to handle errors that occur at run time? Eval code with a begin block that deals with the error?

    32. Where code returns a result it should do so or die with a meaningful error and $!

    I've no idea what you mean by this.

    To me, this list looks bad. Not much thought has been put in it, and important guidelines are missing. Too many buzzwords.

    What about the language used for comments and variable names? Can that be in Chinese combined with Greek? How should objects be implemented? This is a vital guideline in OO-Perl, otherwise inheritance will fail. Is there a require bracing style? Indent size? Variable naming convention? How do you pass parameters to function? Positional parameters? Named parameters? What are you API/object interface guidelines?

    Abigail

      While I agree with some of your points I don't believe hakkr was refering to code that is within other modules. (Like the Autoload example you gave) and I believe he means that HTML::Template should be used for HTML documents instead of adding tons of lines to a CGI program for the HTML. As one who has spent a lot of time using both, it makes lots of sense, separating HTML from the core CGI program make the code easier to understand and allows for designers that don't know a lick of Perl to play with the template files and add their prettyness to them.

      Even though I agree I think that you are nitpicking at his idea rather than offering objective criticism. You try to refute some of his points without actually understanding them. Such as 32. Where code returns a result it should do so or die with a meaningful error and $!
      I think that clearly states that rather than:

      open(FILE,"$filename") or die $!;
      He's saying that his programmers should put:
      open(FILE,"$filename") or die "The file $filename couldn't be opened: +$!";

      This make tons of sense! In a large program its helpful to see "foo.txt couldn't be opened: no such file or directory" rather than "no such file or directory", especially if you are opening more than one file.

      My major complaint with your comment is your feeling about CPAN, he didn't say to never reinvent wheels, he said check CPAN first. Which is a very good idea. Frankly I'm not sure what CPAN you use, bad code usually gets marked as such and there are places to put test results if the code is failing. I use a lot of CPAN modules and normally the only reason one didn't work was because I didn't install it correctly. CPAN saves me tons of time.

      I'm sorry but I had to mod you down for this comment. The tone is more "I don't do it this way, neither should you" which to me goes against the idea of Perl that you can do it any way you want to. I don't blame people for wanting programmers to follow a standard.

      Chris

      Lobster Aliens Are attacking the world!
Re: Perl Programming guidlines/rules
by Anonymous Monk on Nov 21, 2002 at 14:15 UTC
    It would be nice if a Perl/CGI book geared toward a team working enviroment covered what you are trying to do here.

    I smell a ORA book title . . .

    E-commerce with Perl and Apache

    (sorry couldn't resist)

Re: Perl Programming guidlines/rules
by perrin (Chancellor) on Nov 21, 2002 at 14:28 UTC
    15. Always retrieve database rows into hash reference not arrays

    A small tip: using bind_cols() is much faster, and the result is just as readable.

      We've had terrible problems with bugs caused by developers using select * with bind_cols. Frequently the problem can not be reproduced on our development system. Table changes affect the order and/or number of columns, so the hard-coded variable list in bind_cols is a bug waiting to happen.

      Selecting explicit columns fixes the problem, but then the column list is redundant with the bound variable list -- bad (short, weird abbreviations, etc.) variable names can be used because it's "obvious" what the variable holds.

      I like fetchrow_hashref because it's safe to use with select *, self-documents use of database columns in the code, and does not have annoying redundancy between variable names and column names. Hakkr was probably thinking of these advantages.

        Sorry, but I'm flabbergasted that anyone would ever use "select *" in a real program! That is the bug waiting to happen, not bind_cols.

        I think typing out the list of lexical variables is worth the extra speed. There is some data on the speed difference in Tim Bunce's talk.

        Of course, if speed makes no difference in your application then use whatever you like.

        Well, select * is harmful anyway.

        If you've got a perl program, and I add an extra column, or many columns, to the table you're using, then, even though these new columns are completely unnecessary in your program, it still retrieves the values, taking database, network and memory to do so (especially if, for example, I've added a graphic LOB).

        You've got to name the keys of the hash to do anything useful with them, so you might as well name them in the select statement. And that way, if somebody adds extra columns, there are no side-effects. If the columns are renamed, you've got to bring your select statement into sync, but you've got to change your code anyway, to update the keys.

        (This is unless you're writing something which merely dumps the entire table out for a human to look at, without really caring what the column values are.)

        --
        Tommy
        Too stupid to live.
        Too stubborn to die.

        I've worked at a company who produced a large banking application (about 700 binaries and 200 dynamic libraries). We didn't have much coding standards, but among the few we had was "you shall never use 'select *'". Beside the problem mentioned that out of order applying of patches can produce different order of columns on tables, the other disadvantage of using 'select *' is that it's a performance hit. It means that for each time the query is compiled, the database has to do lookups and figure out what the columns will be.

        'select *' is just there for interactive usuage - applications should never use them.

        Abigail

        Redundant variable names are no excuse not to bind. Nor does SELECT * self-document at all, you have to go look at the following code to see which columns are really being asked for. And this being Perl, which has hashes and excellent text munging, there's no reason you need to be redundant either.

        I shall paraphrase some code from chromatic's DBI is ok article on Perl.com:

        sub bind_hash { my $table = shift; my %results; my $sql = do { local $" = ", "; "SELECT @_ FROM $table"; }; my $sth = $dbh->prepare($sql); $sth->execute(); $sth->bind_columns(map { \$results{$_} } @_); return (\%results, sub { $sth->fetch() }); } # ... later: my ($res, $fetch) = bind_hash users => qw(name email); while ($fetch->()) { print "$res->{name} >$res->{email}>\n"; }
        It's not optimal yet, but as you see, a little inventiveness goes a long way.

        Makeshifts last the longest.

Re: Perl Programming guidlines/rules
by particle (Vicar) on Nov 21, 2002 at 15:04 UTC

    i think this list cramps style too much, to a point of hindering development. while i agree that following a good set of coding standards leads to easier maintenance, i don't think this is the best starting point.

    maybe something like:

    • style
      • follow perlstyle, except as noted below
      • do not shift subroutine arguments, use code like my( $self, $arg1, @args ) = @_; instead
      • name SCALAR, ARRAY, and HASH variables in the singular, not plural
      • use taint mode
      • ...
    • substance
      • untaint all passed parameters
      • log all errors/warnings, using #your log module here#
      • use CGI to process CGI parameters
      • use DBI to access database
      • use HTML::Template for HTML templates, not HEREDOCs
      • ...
    • tests
      • write tests for each subroutine
      • test modules for pod, using Test::Pod
      • test for documentation coverage, using Pod::Coverage
      • test modules for version information, using Test::Version
      • verify all tests pass before merging code and tests with development stream
      • ...
    • docs
      • document each module with pod, sections include: your sections here
      • enter Change Request #(s) in comment upon checkin to SCM repository
      • all code must be in SCM repository before release to ...
      • ...

    but i could be completely off base. it's been a while since i've developed these standards (or followed them in a large team.)

    ~Particle *accelerates*

      do not shift subroutine arguments, use code like my( $self, $arg1, @args ) = @_; instead

      May I ask why?

        It's harder to screw up when adding variables. I've done my $this = shift then tried to change it to my ($this,$x) = shift if you start with my ($this) = @_; then my ($this,$my,$something,$asdfjkksaldf) = @_; will be less error prone
      There's two cases I shift:
      sub some_method { my $self = shift; my ($various, $other, $parameters) = @_; }
      and something like
      sub process_list { my ($some, $positional, $params) = splice @_, 0, 3; for(@_) { # ... } }

      The first case is somewhat arbitrary and maybe a bit superstitious; I just like to consistently always shift off the $self when I'm writing OO code - something about that one parameter's significance makes it feel right to me.

      The second case is plain and solid reasoning: I hate to name variables when I don't need to. Temporaries should always hold computed, not copied values, in my opinion. (You also have to work on @_ in case you wanted to modify the passed values - which of course is to be used seldomly and carefully and orthogonal to shifting.)

      The second case is an exception of course though not exactly a red herring either.

      So in general, I would agree; my (@variables) = @_; is preferred. There are good reasons not to, occasionally, however.

      Makeshifts last the longest.

Re: Perl Programming guidlines/rules
by Anonymous Monk on Nov 21, 2002 at 15:42 UTC
Re: Perl Programming guidlines/rules
by dingus (Friar) on Nov 21, 2002 at 15:58 UTC
    I think you have too many rules. And I think you are being too specific. Rules should generalise.

    What I think you should specify is a CGI template with appropriate CGI::CARP etc. etc. and state that all CGIs should be based on template. You might want to have the template divided up in to phases (as I mentioned in Re: Checkbox Query Revisited) as this really helps debugging. here is a sample (you would need clearer breaks and more sample/template code for real)

    #!/usr/bin/perl # use strict; use CGI ':standard'; use CGI::Carp qw(fatalsToBrowser set_message); use Data::Dumper; # other modules # error handler BEGIN { sub handle_errors { my $msg = shift; print "<h1>Script Error</h1> running".$ENV{'SCRIPT_NAME'}."<p><pre +>"; $msg =~ s/ (of )*\Q$0//g; print "$msg</pre><A HREF='/'>Email me this page</a>"; } set_message(\&handle_errors); } # main program starts here my $debug = 1; my $cgi = new CGI; #put authentication here if required my $phase = identify_phase() # $phase = 0 if just printing form otherwise 1 (for complex transactio +ns may return 2, 3 etc.) print_form() unless ($phase); # this routine probably does not return my $validref = validate_params() print $cgi->header, $cgi->pre(Dumper($validref)) if ($debug); # list example Dumper($validref) output to show what is OK unless ($validref{error}) { execute($phase, $validref ); } else { report_param_errors( $validref ); } exit; # subroutines # phase 0 if no posted paramters etc etc sub identify_phase() { } sub print_form() { } sub validate_params() { } sub report_param_errors( ) { my ($parmref) = @_; } sub execute() { my ($phase, $parmref) = @_; }
    Sticking to this kind of template makes it necessary to state what parameters you expect where and that you make it easy for others to follow the structure of the program and to get meaningful debug output at various points.

    Rules about style (e.g. FILEHANDLES uppercase) or object orientation or what have you can just be examples in the template.

    Dingus


    Enter any 47-digit prime number to continue.

      At my company, we were on the verge of hiring a particular programmer. I wasn't too fond of the guy, but his code looked okay. My boss thought he was great and since I was willing to sign off on his code, he was probably going to get offered the job.

      He was still bugging me, though. There were a few things about his code that just didn't belong there and I was struggling to figure out how someone who clearly knew Perl had such obvious pieces of cruft in his code. Some of it was a useless "ref $proto or $proto" line in a constructor. Some of it was an AUTOLOAD function that allowed the programmer to create accessors and mutators from object properties that didn't always exist. It was confusing.

      Then we got lucky. Perhaps he was afraid he wouldn't get the job, but he followed up with another code sample. It was also good code, but it had the exact same problems as the first code sample he sent. That's when I realized what was going on. I knew some people at his previous company and made some calls. I was right. This guy was using code templates and just blindly followed them. The templates provided a generic constructor and AUTOLOAD. Rather than thinking about what he was doing, he was just blindly following orders.

      No offense, but your code template would be horribly inadequate for a template for me. What about warnings? What about taint checking? What if I'm running under mod_perl and need to use the OO CGI interface (you mostly use that, but you have ":standard" in your import list).

      There may be acceptable code templates out there and I'm not opposed to them, per se, but they should be used with thought and caution.

      Cheers,
      Ovid

      New address of my CGI Course.
      Silence is Evil (feel free to copy and distribute widely - note copyright text)

        That sounds similar to what I call a "wizard's apprentice" in Microsoft Visual C++. I've interviewed people who can't write a class, but claim to code C++.

        No offense, but your code template would be horribly inadequate for a template for me. What about warnings? What about taint checking? What if I'm running under mod_perl and need to use the OO CGI interface (you mostly use that, but you have ":standard" in your import list). My template was meant as an example of what you could do - it was known to be incomplete and/or wrong. I hacked it in about 2 minutes from the one I do use to make it begin to correspond to the style that the original poster wanted.

        Just to make it clear: ANYONE WHO USES IT AS IS STUPID BECAUSE ITS INCOMPLETE AND PROBABLY BROKEN

        As you noticed it has some OO inconsitencies. This is because I don't use mod_perl, tend not to OO my CGI scripts, and I failed to completely convert to OO.

        There may be acceptable code templates out there and I'm not opposed to them, per se, but they should be used with thought and caution.

        100% agreement - see more comments below

        This guy was using code templates and just blindly followed them. The templates provided a generic constructor and AUTOLOAD. Rather than thinking about what he was doing, he was just blindly following orders.(My emphasis)

        Templates are there to get you started. Slavish devotion to a template, like slavish devotion to rules or anything else is always a problem. Neither templates nor rules will make a barely competant programmer into a genius, but a template can help the next guy figure out what dumb error Mr lowgrade bozo just committed.

        A template like the one I suggested makes it relatively easy to generate test cases and to break down the action into separately testable chunks. As someone who has spent much time debugging other people's crud, this kind of inbuilt modularity and testability I wish to encourage, hence my template.

        Dingus


        Enter any 47-digit prime number to continue.
Re: Perl Programming guidlines/rules
by hardburn (Abbot) on Nov 21, 2002 at 16:00 UTC

    5. OO always to be used where possible

    For CGI scripts, I don't see the point. Most CGI scripts are going to have around 100-200 lines of real code (not including the stuff that is just outputing HTML). While using OO-based modules is a good idea, it seems like overkill for the script itself.

    17. Assign CGI parameters to hashes
    ...
    23. No Global Variables

    OK, let's give the mandatory "Global Variables are Bad" (tm). Fine.

    In CGI scripts, I prefer to put options passed to the script in global vars. Every function can get that information anyway, we just make it easier this way. Each of my scripts has a set_params() function that is the only function allowed to change those global vars. Example:

    use CGI qw(:standard); # Other modules my ($param1, $param2, $param3); sub set_params { $param1 = param("param1") || 0; $param2 = param("param2") || 0; $param3 = param("param3") || 0; } set_params(); # Whatever other code you want

    (I rarely use the fancy features of CGI.pm, so I usually don't bother with the OO interface, though I probably should).

    The problem with putting them into a hash is that you take away many of the benifits you get from "use strict 'vars'".

    7. No hardcoding of values especially file paths, script names etc
    8. Config files or config scripts to be used

    How do you get the config file without hardcoding the path to it?

    Update: For config files, I suggest being very careful with absolute paths. At my job, we recently moved to a new server, and we had a lot of problems with scripts that were using absolute paths that were no longer available on the new server. Additionally, our test server had some paths to the server's directory that didn't exist on the production server, which broke even more stuff.

      Note that your approach to widely scoped lexical variables (those are not globals, by the way) could be a real problem in a persistent environment like FastCGI, PerlEx, or mod_perl. Those are closures, so if any of them didn't get set on a particular request they would retain their previous values.

        . . . those are not globals, by the way . . .

        OK, fine, in Perl, there technically aren't any global vars. There is just vars that have a really wide scope. However, for practical matters, widely scoped lexicals are global vars.

Re: Perl Programming guidlines/rules
by arrow (Friar) on Nov 21, 2002 at 16:14 UTC
    I think hakkr's list is great, but not perfect. I would agree with most of his points, one notable execption (in my opinion) is

    2. use HTML::Template not HERE documents

    HERE documents are very helpful, and one doesn't have to learn how to use something new (i.e. HTML::Template). I've used it before, and it's harder to use it than to write multiple print statements (my opinion). Otherwise, this is a great idea.
      Thanks, arrow This is mainly to help our designers from mangling our cgi scripts. I agree they are very useful but not when all your applications HTML is stored in them and you have to trawl through code to make minor html changes. Basically trying to separate code and presentation helps to get the designers to take care of html changes
Re: Perl Programming guidelines/rules
by Anonymous Monk on Nov 21, 2002 at 17:48 UTC
    Does anyone know of a Perl script that can see if pre-defined style guidelines are followed in other Perl scripts?

    Maybe a script that will parse other scripts and then email a message or send output to the command line with what it finds?

    (I hope i'm not giving away a great idea here!)

      I've not encountered a script that assessed style, but there is perltidy that will reformat according to a configurable set of rules. I wouldn't be too hard to wrap that with another script that tidied the input file then did a diff between the input and the output and made some assessment of the changes.

      Though its probably simpler to insist that all code is maintained in the code repository in some standardized style (defined in terms of a Pertidy configuration for example).

      That way, people who prefer, or are accustomed to a different style to the 'house style', can set their editor to use perltidy to reformat the house style to their own preferences when it loads a script and back to the house style when it saves. That way, nobody finds themselves slowed down and driven crazy because they have to keep manually re-editing the stuff they naturally type in one way, but that the house style requires in another.

      This won't take care of everything on the OP's 32 point list, but then, as the thread shows, many of those points are controversial to say the least.

      Personally, I find some of them bewildering. Not using $_ when appropriate is a little like forcing me to say: Mister George William? Bush Esquire, currently of sixteen hundred Pennsylvania Avenue, Washington, District of Columbia, United States of America. each time instead of "The US president". It maybe (or may not in this case, I'm not sure of all the details:) be more accurate, but boy, is is ever long-winded.

      In the same way that anyone in the US has a pretty good idea what you mean when you say "The President", (though you may have to say "The US president" outside of the US), so it is with $_. Anyone with more than a passing familarity with Perl, will, in most instances recognise the meaning and value of a given use of $_ in a given context. Of course it can be abused, but so can a long variable name.

      The typical notion of obfuscation is to reduce all the variable names to single chars. Much more effective is to use meduim length, descriptive names that are apparently meaningful in the given context, but just not used for the purpose they apparently serve.

      • A variable called $loop_counter that actually serves as a constant for determining loop termination.
      • A package global array called @temp that is actually used to pass data between independant calls to the methods, but who's content varies depending on which method was called last. (Real life example!)
      • Subs called max() and min() that have the test conditions inadvertantly reversed. (One of my own:()
      • A sub called ValidateInput() that starts out checking that input is numeric but ends up with a mass of logic for reading the input from various sources and building data structures from it. Another real situation I've encountered. (I wasn't the author this time.)

      In the end, no amount of automation is going to detect, never mind rectify stuff like this.


      Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
      Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
      Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
      Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

Re: Perl Programming guidelines/rules
by bnanaboy (Beadle) on Nov 21, 2002 at 18:00 UTC
    Something that I've found helpful in working on group CGI projects is including in the POD all database tables and files read from, written to, created, etc. That way if something is getting changed and you don't know what's doing it, it's easier to track down, or if a change is made to the name and/or location of something, it's easier to find all the scripts that need to be updated.
Re: Perl Programming guidelines/rules
by deadkarma (Monk) on Nov 21, 2002 at 19:34 UTC
    From my personal experience, perl programmers are usually a diverse/creative bunch.

    If you wanted a lot of control and rules over your programmers, maybe you should have hired some VB programmers instead.

    In my opinion, Visual Basic: There's Only One Way To Do It.

Re: Perl Programming guidelines/rules
by logan (Curate) on Nov 21, 2002 at 19:57 UTC
    1) Open each subroutine with an description of what it does and how:
    #------------------------------------------- # method: getExitCode() # Author: Logan logan@eng.somecompany.com # Usage: $ahm->getExitCode(); # return : int # Gets the exit code from a process and returns it #------------------------------------------- sub getExitCode { my $ap = shift; return($ap->{procExitCode}); }

    2) Open each package and script with a high-level description of the work flow and a variable list.

    -Logan "What do I want? I'm an American. I want more."

      1) Open each subroutine with an description of what it does and how:

      If you're going to do this, don't cargo cult from C and do it in comments. Use POD, so that the API documentation can be extracted into web pages.

      Again, it's all a matter of taste, but

      • # method: getExitCode()

        This just replicates the sub getExitCode { line.

      • # Author: Logan logan@eng.somecompany.com

        Authorship is often mixed even within a script and becomes redundant as soon as the author moves on. Email id's change.

        In a corporate environment, contact information should be in terms of "ownership" (in the 'who is responsible for this code') rather than who actually coded it. In any case, this information is better located in source-control database where it can easily be viewed and aggregated with other such info rather than in the source where it becomes out of date.

      • # Usage: $ahm->getExitCode();

        I understand that this is just a simple example, but I see no benefit in this line at all. It gives me no information not readily available two or three lines below.

        Even if the method takes parameters, these cannot easily be added to this line, and would need to be documented seperately.

      • # return : int

        This might make some sense in a C program, though even then, the function prototype serves much better and is more likely to be correct and up-to-date. What's an "int" in Perl terms? 16-bits? 32-bits?

      • # Gets the exit code from a process and returns it

        If the sub/method name is well chosen, as this appears to be, I see this as redundant.

      • #------------------------------------------- x 2

        Ugg! A personal hate, not a bad as

        ############################################# or

        /*********************************************

        But not much better. My preference is for whitespace to black, but again its all personal (or corporate edict).

        Can you imagine reading your favorite novell if every paragraph started with

        #------------------------------------------- # method: setSceneForSurpriseEnding() # Author: WilliamS@YeTheatre.Olde # Usage: read->setTheSceneForSurpriseEnding() # return : SceneSet # Sets the scene for the surprise ending and returns it #-------------------------------------------

        I hope you won't take my levity too seriously, it's just one nobodys opinion:)

      When I first set about trying to understand a piece of code, I generally ignore the comments and have even written several macros to remove them. Too many times I have been coersed into believing them when they only reflected the orginators intentions and not his actions. This can happen easily enough from reading the variable names as I said elsewhere today, but generally the authors actions are more clearly indicated by the code than by comments which all too frequently become out of date and therefore worse than just redundant, they can be misleading.

      Of course, a comment warning of an unusually conveluted algorithm or indicating where (for example) additional punctuation is critical to the correct interpretation of the code are necessary and helpful. I also find it helpful if the coder indicates why s/he did certain unusual or unituative things in a particular way. Beyond that, I generally feel that most comments in code are redundant and serve only to obscure the forest with trees.

      An untypical, and probably unpopular opinion, but one I have arrived at through bitter experience.



      Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color.
      Pick up your cloud down the end and "Yes" if you get allocated a grey one they are a bit damp under foot, but someone has to get them.
      Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory.
      Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.

        1) * # method: getExitCode() This just replicates the sub getExitCode { line.

        Yes, it does. That's because I used the simplest possible example for the sake of brevity in this writeup.

        2) Email. I work for a small company that's had a lot of management turnover. I want credit for my code. I also want to make it easy for a new hire to find me if they have questions. I still remember the afternoon I spent wandering around trying to find out who wrote an obscure chunk of code.

        3) # return : int And most of my co-workers use C. Int mostly serves to contrast from returning a string, array, hash, hash ref, or whatever else I may use.

        I recently inherited a major chunk of code from a departed engineer. It was his discipline in using this commenting format that made it possible for me to figure out what he was doing. After years of inheriting spaghetti code, I appreciate a little discipline. Don't get me started on the "Perl Coder" who didn't understand arrays and refused to use them.

        -Logan
        "What do I want? I'm an American. I want more."

        When the code and the comments disagree, both are probably wrong.
        -- Norman Schryer
        :)

        Makeshifts last the longest.

Re: Perl Programming guidelines/rules
by dws (Chancellor) on Nov 21, 2002 at 20:09 UTC
    I am composing some house rules/guidlines for our CGI programmers.

    No list of "Thou shalt" and "Thou shalt not" rules is complete without this rule:

    N+1: None of the above rules are an excuse to anything stupid. Use your common sense.

Guidelines for the non-Gurus
by Wally Hartshorn (Friar) on Nov 21, 2002 at 20:43 UTC

    Randal and Abigail sort of slammed your list as being too simplistic (which I'm sure it is for them and most of the people they work with), but I think it sounds fairly reasonable. If you're in a programming environment where you have Perl gurus in every cubicle who've been doing this stuff for years, sure, your list is too restrictive.

    However, if you're in a programming environment where most of your Perl programmers are fairly new to the language (which I suspect is the case for many of us), I think your list is a good starting point to prevent at least some of the worst programming offenses.

    Yes, it would be nice if we all worked with highly experienced people who eat, sleep, and breathe Perl every day, but we generally aren't that lucky. I think that in many shops, Perl programming is something that is done in addition to our "real" work. Having some basic "idiot lights" on the dashboard to help us avoid some of the most common problems can be helpful.

    By the time you're surrounded by Perl gurus, you can come up with a different list (if you even need one at that point).

    Wally Hartshorn

Prior art reference, and comment
by Anonymous Monk on Nov 21, 2002 at 23:18 UTC
    Someone should mention Skud's article "In Defense of Coding Standards: How to Create Coding Standards that Work" on perl.com. That's a good starting point for any Perl coding standards I'll ever need.

    Use OO Always might be appropriate in a CGI shop (I sure regretted not doing so once!), but would be serious overkill in some of the scripting glue elsewhere in production systems where Perl is a better Sh(ell). I don't need to write a Backup.pm and then say
    perl -MBackup -e '$backup=Backup->new(); $backup->doit()', I can just write a backup.pl and run it. And for the better-grep-and-awk stuff, it's way overkill.

    -- Bill n1vux

      How many systems are you backing up Bill the Anonymous Monk ? Are you going to deploy the script to every corner of the network, hack the configuration each time? What happens when the script is revised ? Magically diff the code updates from the variable hacks and then what. The OO call to Backup.pm could easily be a single config file. Update the module often as you like , all the config stays where it should be.

      Backup->new( "/etc/sys-backup.cfg" );

      If I've typed it twice already, how many more times today before I make it a method? $self->grumpy_reply(\$parent) : )

        What, you really think OO is the only way to archieve this? Don't be silly.
        use Backup; Backup::doit config => "/etc/sys-backup.cfg";

        works as well as an OO-wrapper.

        I've written backup programs in Perl as well. It did use a configuration file, and the functionality was changed without the need to change the interface. And guess what? It wasn't OO!

        Abigail

Re: Perl Programming guidelines/rules
by Juerd (Abbot) on Nov 22, 2002 at 05:36 UTC
    Some suggestions:

    3. no deliberate obfuscation

    s/deliberate//

    5. OO always to be used where possible

    s/possible/appropriate/

    6. Pass by reference

    .= ' when appropriate'

    9. Do not reinvent wheel always check CPAN

    .= ' but do reinvent elliptic wheels'

    19. All subroutines to be commented

    s/commented/documented/

    23. No Global Variables

    s/Glo/Needless Glo/

    24. use my not local

    Repeats 23

    28. Do not use $_

    WTF? Why not? s/foo/bar/ for @xyzzy;

    30. Initialise variables

    Repeats 1

    32. Where code returns a result it should do so or die with a meaningful error and $!

    Always die? Even with errors that aren't fatal?

    - Yes, I reinvent wheels.
    - Spam: Visit eurotraQ.
    

Re: Perl Programming guidelines/rules
by mt2k (Hermit) on Nov 22, 2002 at 08:16 UTC
    Okay, I'll say good job on the list, [so|but] I'll make a few comments:

    use strict and warnings

    Glad that's option #1!
    use HTML::Template not HERE documents
    I disagree... there are times when content really is generated on the fly, and you have no clue where certain things are going to go. Hard to explain, but I think HTML::Template is more of a per-user preference.
    no deliberate obfuscation
    "Though shall not deliberately murder" :)
    As stated previously... s/deliberately//
    all files must be documented with POD (Author, Date created, synopsis, description)
    Absolutely... that way you know who to blame when something goes wrong :)
    OO always to be used where possible
    I disagree with this one. Sometimes making code "OO-compliant" is just extra work that just isn't necessary. This is definitely a case-to-case (script-to-script) decision.
    Pass by reference
    This is another one that really depends on the case. If the data structures being passed to the function may need to be modified, then absolutely... but if the variables passed are just being read, why create a bigger hassle? The example below shows a case where passing by reference just wastes an extra dollar sign and backslash:
    sub name_is { my $name = shift; return "My name is " . $$name; } my $name = "mt2k"; print name_is \$name;
    No hardcoding of values especially file paths, script names etc
    Excellent one! If you start throwing file paths all over the place, you're bound to run into major trouble if the script changes location or environment. At least throw these in a configuration script or in a configuration subroutine at the top of the script.
    Config files or config scripts to be used
    All for them! Read the last point to understand why.
    Do not reinvent wheel always check CPAN
    For probably 90% of the cases you'll come across, I would agree with you. As long as you trust someone else's code to be stable and not have any unexpected "features". Chances are, someone's already coded the function you need... why spend hours debugging one you tried to put together yourself :)
    minimise coupling, maximise cohesion, maximise reuse
    yes, yes and yes.
    Always indent logic
    Please do! It increases readablity, which means much easier maintenance... plus anybody else who has to look at your code will keep their sanity
    Any reused code block above 5 lines must become a sub routine
    Absolutely... though I wouldn't restrict it to 5 lines... if you use 3 lines 10 times in a script... that should most definitely become a subroutine.
    Always retrieve database rows into hash reference not arrays
    Nope. If you know that you're only getting one result back (ie: a primary key), using fetchrow_arrayref() can be quick.
    Use CGI CARP
    As long as you're not talking about fatalsToBrowser(), then yes. Remember, fatalsToBrowser() might display something to your users you might not want them to see. For production/debugging purposes yes, for code running to online users, no.
    Assign CGI parameters to hashes
    Definitely! This makes things extremely easy to access, though I must admit I don't know what would happen if you were to have a <select MULTIPLE>. Must look that up sometime.
    Version history at bottom of file
    If there have been changes worth documenting, then yes.
    All subroutines to be commented
    Whether as POD or comments, as long as there is at least something to explain what a subroutine does. If it's just a script, I believe comments (that should include a sample use) are fine. If it's a module... there had better be POD :). As a matter of fact, I'll show how I comment my subs:
    # get_params # Retrieve all CGI parameters and return # a hash ref containing the data # ie: my $input = get_params; # print $input->{'name'}; sub get_params { my $q = new CGI; my %cgi_data; $cgi_data{$_} = $q->param($_) for ($q->param()); return \%cgi_data; }
    Meaningful Subroutine and Variable names to be used
    /me shudders at the thought of a script that uses $a, $b, $c, $d type variables. (especially a script that uses $a and $b, whilst using sort as well!)
    Constructors must only create objects
    Just learning about these at the moment actually! But I do understand the meaning behind this one. A constructor constructs an object... it doesn't fill the object with data :) See, I can learn!
    Program flow control to be placed in 'main' method
    If by this you mean that program flow is a set of subroutines being called in a row, without a bunch of clutter code in between, then I strongly agree. Subroutines are there to organise the code. If they are only used to make it messy, it doesn't help.
    No Global Variables
    100% approval there.
    use my not local
    my() is much better - save yourself the trouble!
    All Perl filehandles should be in all uppercase
    Yep, I agree completely. I can't really argue why, besides the fact that readability and maintainability are increased. I've just always used uppercase file handles
    Always use CGI.pm
    For CGI scripts that need to handle data, CGI.pm is a must, if only for reading CGI parameters. Don't do it by hand. The first time I needed to read in data, I processed STDIN and $ENV{'QUERY_STRING'}... horrible! Avoid it like the plague!
    Use verbs for subroutines, use nouns for variables, be descriptive and consistent
    Agreed. I can't imagine a sub named without a verb...
    Do not use $_
    There are times when not using $_ decreases readability and might even make things run slower. Take map(), grep(), and quick uses of for{...} for example(s?).
    Do no use goto
    There is absolutely no reason or excuse for using goto(). You should be using subroutines or a loop to handle these. Want to use goto()? Go back and use QBasic! As a matter of fact, is Perl 6 still going to support goto? It should be removed for the sanity of us all :)
    Initialise variables
    To expand upon this one, I'd like to add that variables should be declared at the appropriate place in the script. If you need a widely scoped lexical, place it near the top of the script. If you need a variable for only a subroutine, place it at the beginning of that subroutine. What I am saying is group all variable declarations of the same scope together.
    Error handling done using Begin blocks at CGI level. All dies are logged
    Shame on me... I've never used BEGIN{} for this purpose. The 2nd thing I need to look up regarding this list :)
    Where code returns a result it should do so or die with a meaningful error and $!
    A definite must. What if you unsuccessfully flock() a file... then you open the file for writing, even though the file has not really been locked... expect trouble if you ignore this tip. In my opinion, this should have been item #2 on this list.
      There is absolutely no reason or excuse for using goto()

      I think you need to take another look a goto - particularly the description of "goto &NAME". That can be very useful.

      --
      <http://www.dave.org.uk>

      "The first rule of Perl club is you do not talk about Perl club."
      -- Chip Salzenberg

      What if you unsuccessfully flock() a file... then you open the file for writing, even though the file has not really been locked... expect trouble if you ignore this tip.
      Are you serious? No, you can't be. You cannot even do what you describe. You can only flock filehandles - and you only get a filehandle after opening a file. You don't first flock, then open. No, you first open, then flock.
      There is absolutely no reason or excuse for using goto().
      Really? You'll find people like Donald Knuth and Dennis Ritchie disagreeing with you. While goto ought to be used with caution, there's no reason to flat out forbid it. Not to mention that there's no alternative for goto &sub in Perl - unless you code similar functionality using XS.
      I can't imagine a sub named without a verb.
      Is new a verb? What about commonly used functions from CGI, like param, header or keywords?
      my() is much better (than local)
      But my isn't a drop-in replacement for local. Certain things that you can localize, you can't my.
      For CGI scripts that need to handle data, CGI.pm is a must, if only for reading CGI parameters. Don't do it by hand. The first time I needed to read in data, I processed STDIN and $ENV{'QUERY_STRING'}... horrible! Avoid it like the plague!
      Hmmm, for programs that only need reading in CGI parameters, I think CGI.pm is overkill, don't you? It's a dinosaur (with a bad name), and not exactly a good example of modular code writing. As for processing STDIN and $ENV {QUERY_STRING}, what do you think CGI.pm does? Get the parameters out of thin air?
      Yep, I agree completely. I can't really argue why, besides the fact that readability and maintainability are increased. I've just always used uppercase file handles.
      I, on the other hand, have hardly used filehandles for the last years (except for the default ones provided by Perl). We have been able to use references to filehandles for filehandle operations ever since 5.000, and since 5.6, open will autovivify one if given an undefined value as first argument.
      /me shudders at the thought of a script that uses $a, $b, $c, $d type variables.
      Ah, so you prefer:
      for (my $array_index_counter = 0; $array_index_counter < @array_of_some_sorts; $array_index_counter ++) { ... }
      The Pascal and Java way. I rather keep the tradition of Fortran, Algol and C, and use $i. It's ok for variables of a limited scope to have short names.
      Whether as POD or comments, as long as there is at least something to explain what a subroutine does.
      Keep your comments brief and short. If most of your subroutines need a lot of comments, it's a sign something is wrong. A good name goes a long way in describing what a subroutine does. And so do named parameters. POD and comments aren't interchangeable. The main purpose of POD is to generate manual pages - it will describe the interface. Comments describe the internals.
      if you use 3 lines 10 times in a script... that should most definitely become a subroutine.
      open my $fh => $file or die "Whatever: $!"; flock $fh => LOCK_SH or die "Whatever: $!"; local $_; while (<$fh>) { ....; } close $fh or die "Whatever: $!";
      That's five, often repeated, lines. You want to factor that out to a subroutine? How often do you think that happens in practise? What about a DBI prepare/execute/fetch combo wrapped in an eval? That's more than three lines as well. Factor it out?

      Abigail

Re: Perl Programming guidelines/rules
by davorg (Chancellor) on Nov 22, 2002 at 11:09 UTC

    Well, I'd hate to work somewhere that was so perscriptive about how I wrote my code, but here are some specific points:

    2. use HTML::Template not HERE documents

    HTML::Template and here docs don't really do the same thing (except in a few narrow areas) so that doesn't really make sense. And besides, the Template Toolkit is better :)

    5. OO always to be used where possible

    That's just silly. It's almost always possible to use OO. It's not always appropriate tho'.

    6. Pass by reference

    I don't think that means what you think it means. I think you mean "pass references to datastructures into subroutines".

    15. Always retrieve database rows into hash reference not arrays

    A more usefule rule would be "always explicitly name selected columns in an SQL statement" (aka "don't use SELECT *").

    16. Use CGI CARP

    What? Even if I'm not writing a CGI program?

    17. Assign CGI parameters to hashes

    Why?

    22. Program flow control to be places in 'main' method

    I really don't understand the point of that.

    23. No Global Variables

    What do you mean by "global variables"? If you mean file-scoped lexicals then they should be avoided, but can often be useful. If you mean package variables then they are often the only to achieve certain things (like sharing variables between packages).

    24. use my not local

    But there are times when only local will work. I think you meant "prefer lexical variables over package variables".

    26. Always use CGI.pm

    Again, that seems a bit silly if you're not writing a CGI program.

    28. Do not use $_

    That rule alone would be sufficient grounds for me to resign on the spot!

    29. Do no use goto

    At the very least you should make and exception for "goto &sub".

    30. Initialise variables

    Why? Perl initialises all variable with sensible values.

    31. Error handling done using Begin blocks at CGI level. All dies are logged

    Another rule that seems unnecessarily CGI-biased.

    --
    <http://www.dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      Thanks everybody for a great response it seems I still have a lot to do.

      I'll definitly be adding Code Templates looking at perltidy and refining the list with your comments. Adding some justification for each statement will also be a worthwhile exercise. So sometime soon you may expect the final perlmonk audited list:)

      I think Wally hit it on the head by pointing out what I should have said at the start that this is intended as a list for Novices or bad programmers, new people who join our team and usually have no knowledge of Perl. Perl Gurus are a verifiable rarity and we have so far only managed to hire a couple of people with any real Perl knowledge. A Language migration course is what we really need to help quickly convert them from their native language

      I'm thinking many ways means many guidelines

Re: Perl Programming guidelines/rules
by miki (Initiate) on Nov 22, 2002 at 12:23 UTC
    18.Did you consider using CVS?
    28. I do not agree. If you leave out $_ you prohibit a vital perl feature.
    30.If warnings are turned on you are warned about uninitialized values, aren't you?
    Finally, there is perltidy, which formats your code according to your rules.

    update (broquaint): title change (was $_)

      If warnings are turned on you are warned about uninitialized values, aren't you?
      That's one of the points of use warnings. If you are using uninitialized variables all over the place, then perhaps a review of the code is necessary. Below I have thrown together an example of two scripts that do the exact same thing... except that one produces warnings and one doesn't :)

      #!/usr/bin/perl # The 'bad' version - produces warnings use strict; use warnings; my $name; # perhaps some code to get user's name # { ... } print 'My name is ' . $name;
      #!/usr/bin/perl # The 'good' version - produces no warnings use strict; use warnings; my $name; # perhaps some code to get user's name # { ... } # The key line. Set a default value, even if blank. $name ||= ""; print 'My name is ' . $name;
Re: Perl Programming guidelines/rules
by stephen (Priest) on Nov 27, 2002 at 18:43 UTC

    Instead of a style guide, might I recommend code reviews? Every week or two, have a different member of the team send out copies of the code they're working on or just completed, along with explanation of what it's supposed to be for. Schedule a meeting for that person to walk through the code, explaining it, and have your developers question what they find unclear or dangerous.

    Even if you decide to go ahead with a style guide after starting the code reviews, the style guide will be more effective. First of all, since code reviews are a two-way street, you'll be able to see what your coders do and don't understand and what needs explanation (cf. Abigail's why's). Since reviews are a two-way street, your CGI programmers will have had an opportunity to contribute to the process, so they'll be more likely to go along with it. Finally, code reviews give you an opportunity to enforce your style guide.

    Personally, though, I find style guides are usually a Procrustean solution. When I've tried writing them in the past, it came from spending so much time programming the computer that I was trying to program the people around me. I understand where most of these rules are coming from, but I've disobeyed most of them in code that I still consider good. Most style guides are enthusiastically propounded, then studiously ignored.

    stephen