http://www.perlmonks.org?node_id=744932

I've been asked to prepare some guidelines on coding standards and code reviews in general, across a number of languages used at work, not just Perl. After doing a bit of basic research, I've cobbled together the notes below. To gain some feedback, I'm posting an early draft here.

Note that language-specific coding standards are not covered here; these will be covered in separate coding standards documents, one for each language.

Most of the coding guidelines below were not invented by me, but derived from hopefully well-respected sources; see the Coding Standards References section below for details.

Update: see also Why Create Coding Standards and Perform Code Reviews?

General Guidelines

These are some general attributes you should strive for in your code. Remember that code maintainability is paramount.

  1. Correctness, simplicity and clarity come first. Avoid unnecessary cleverness.
  2. If you must rely on cleverness, encapsulate and comment it.
  3. Robustness, Efficiency, Maintainability.
  4. Scalability, Abstraction, Encapsulation.
  5. Uniformity in the right dimension, creativity in dimensions that matter.
  6. DRY (Don't repeat yourself). Duplication exists in data as well as code.
  7. Prefer to find errors at compile time rather than run time.
  8. Establish a rational error handling policy and follow it strictly.
  9. Throw exceptions instead of returning special values or setting flags.
  10. Know when and how to code for concurrency.
  11. Use a single-step automated build system.
  12. Practice releasing regularly.
  13. Plan to evolve the code over time.
  14. Invest in code reviews.
  15. Consider the code from the perspective of: usability, simplicity, declarativeness, expressiveness, regularity, learnability, extensibility, customizability, testability, supportability, portability, efficiency, scalability, maintainability, interoperability, robustness, type safety, thread-safety/reentrancy, exception-safety, security. Update: a few more: correctness, reliability, reusability, productivity/timeliness, documentation/discoverability. Resolve any conflicts between perspectives based on requirements.
  16. Agree upon a coherent layout style and automate it.
  17. When in doubt, or when the choice is arbitrary, follow the common standard practice or idiom.
  18. Don't optimize prematurely. Benchmark before you optimize. Comment why you are optimizing. (See On Code Optimization).
  19. Don't pessimize prematurely.
  20. Don't reinvent the wheel. If there is a library method that implements the functionality you need, use it.
  21. Be const correct.
  22. Adopt a policy of zero tolerance for warnings and errors. This principally means compiling cleanly at high warning levels, but is broader than that; for example, tools such as checked STL implementations, static code analysers (e.g. Perl::Critic, FxCop), dynamic code analysers (e.g. valgrind, Purify), unit tests (e.g. Test::More, NUnit), code that emits spurious warnings/errors in customer log files, and so on. Third party files may be exempt from this policy.
  23. Use a revision control system.
  24. Write the test cases before the code. When refactoring old code (with no unit tests), write unit tests before you refactor.
  25. Add new test cases before you start debugging.
  26. Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.

Design and Architecture

  1. Know Architectural patterns. Choose wisely. Multitier architecture is common.
  2. Coupling and Cohesion. Systems should be designed as a set of cohesive modules as loosely coupled as is reasonably feasible.
  3. Testability. Systems should be designed so that components can be easily tested in isolation.
  4. Information hiding: Minimize the exposure of implementation details; provide stable interfaces to protect the remainder of the program from the details of the implementation (which are likely to change). Don't just provide full access to the data used in the implementation. Minimize the use of global data. Avoid Action at a distance.
  5. Interfaces matter. Once an interface becomes widely used, changing it becomes practically impossible (just about anything else can be fixed in a later release).
  6. Design the module's interface first.
  7. Design interfaces that are: consistent; easy to use correctly; hard to use incorrectly; easy to read, maintain and extend; clearly documented; appropriate to your audience. Be sufficient, not complete; it is easier to add a new feature than to remove a mis-feature.

OO Design

  1. Favour Composition Over Inheritance.
  2. SOLID: Single-responsibility (a class should have only one responsibility), Open-closed principle (open for extension, closed for modification), Liskov substitution principle (derived classes must be substitutable for their derived classes), Interface segregation principle (clients should not be dependent on interfaces they do not use), Dependency inversion principle (program to an interface, not an implementation).
  3. GRASP: General Responsibility Assignment Software Patterns (or Principles). Guidelines for assigning responsibility to classes and objects in object-oriented design. The different patterns and principles used in GRASP are controller, creator, indirection, information expert, high cohesion, low coupling, polymorphism, protected variations, and pure fabrication.
  4. Know Software Design Patterns. Choose wisely.
  5. Dependency Injection. Don't hard-code your dependencies. Especially useful to improve testability.
  6. Define Class invariants.
  7. Immutability. Prevent inappropriate access by making your objects immutable: provide the data to their constructors, then disallow any modifications of this information thereafter. Note that Java strings are immutable. Immutability can also be useful for thread safety and in functional programming (see Pure function).

API Design Checklist

  1. Make it easy to use correctly, hard to use incorrectly. To illustrate, Scott Meyers gives a cute example of a Date class with constructor Date(int month, int day, int year), making it easy to use incorrectly (i.e. to get the parameter order wrong). Another illustration is the PBP guideline: "Use a hash of named arguments for any subroutine that has more than three parameters", which works well because humans are better at remembering names than orderings.
  2. "Play test" your API from different perspectives: newbie user, expert user, maintenance programmer, support analyst, tester. In the early stages, imagine the perfect interface without worrying about implementation constraints. Design iteratively.
  3. Names matter. Choose names that are explanatory, consistent, regular.
  4. Have guiding principles and clear requirements.
  5. Define sound conceptual models and domain abstractions.
  6. Consider whether your domain functionality is best delivered as a library (e.g. DBI), an application framework (e.g. Catalyst), or a DSL (e.g. YACC (a translator) or make (an interpreter)).
  7. Hide implementation details. Reflect the user mental model, not the implementation model. Information hiding: Minimize the exposure of implementation details; provide stable interfaces to protect the remainder of the program from the details of the implementation (which are likely to change). Don't just provide full access to the data used in the implementation.
  8. Make easy things easy, hard things possible. Huffmanize.
  9. For non-public APIs, be sufficient, not complete. It is much easier to add a new feature than to remove a mis-feature. When in doubt, leave it out.
  10. When in doubt, or when the choice is arbitrary, follow the common standard practice or idiom.
  11. Consider the API from the perspectives of: usability, simplicity, declarativeness, expressiveness, regularity, learnability, extensibility, customizability, testability, supportability, portability, efficiency, scalability, maintainability, interoperability, robustness, type safety, thread-safety/reentrancy, exception-safety, security. Resolve any conflicts between perspectives based on requirements.
  12. Error handling. Document all errors in the user's dialect. Prefer throwing exceptions to returning special values. Prefer to find errors at compile time rather than run time (e.g. PBP p.182, Named Arguments: pass as a single hash ref not a list of name/value pairs, so that some argument errors are caught at compile time).
  13. Apply the principle of least astonishment. In particular, choose sensible (expected) and secure defaults.
  14. Look for ways to eliminate ungainly parts of the API (e.g. the need for repeated boilerplate code when using the API).
  15. Plan to evolve the API over time.
  16. Learn from prior art. In particular, avoid repeating interface design mistakes of the past.

Maintainability/Supportability

  1. Follow the de facto standard set by the code you are editing. Or change the entire source file to the new standard.
  2. Remove unused code.
  3. Assert liberally. Asserts should be used to document the constraints for a piece of code.
  4. Log liberally. Strive to log enough information to trouble-shoot a customer problem without the need to attach a debugger.
  5. The result of every file operation or API call or external command must be checked, and unexpected results handled.
  6. Any unexpected result from a file operation or API call or external command should be logged.
  7. Avoid magic numbers. Note that 0 and 1 are ok and are not considered magical.
  8. Limit and explicitly comment case "fall throughs".
  9. Avoid side effects. e.g. inside macros.

Layout

Layout rules will vary between organisations. These sort of arbitrary code layout rules should be enforced by a tool (e.g. Perl::Tidy).

  1. Use spaces not TABs.
  2. Three character indent (four is more common; get agreement and enforce with a tool).
  3. No long lines. Limit the line length to a maximum of 120 characters.
  4. No trailing whitespace on any line.
  5. Put brace on a new line.
  6. Single space around keywords, e.g. if (.
  7. Single space around binary operators, e.g. 42 + 69
  8. Single space after comment marker, e.g. "// fred" not "//fred", "# fred" not "#fred"
  9. No space around unary operators, e.g. ++i
  10. No space before parens with functions/macros, e.g. fred( 42, 69 )
  11. Single space after parens with functions/macros, e.g. fred( 42, 69 )
  12. Single space after comma with functions/macros, e.g. fred( 42, 69 )
  13. Layout lists with one item per line; this makes it easier to see changes in version control.
  14. One declaration per line.
  15. Function calls with more than two arguments should have the arguments aligned vertically.
  16. Avoid big-arse functions and methods. Ditto for large classes and large files.
  17. Avoid deep nesting.
  18. Always use braces with if statements, while loops, etc. This makes changes shorter and clearer in version control.

Naming

  1. Use descriptive, explanatory, consistent and regular names.
  2. Favour readability over brevity.
  3. Avoid identifiers that conflict with keywords.
  4. Short names for short scopes, longer names for longer scopes.
  5. Avoid ambiguous words in names (e.g. Don't abbreviate "Number" to "No"; "Num" is a better choice).
  6. For packages and classes a good naming scheme is: Abstract_noun, Abstract_noun::Adjective, Abstract_noun::Adjective1::Adjective2. For example: Disk; Disk::Audio; Disk::DVD; Disk::DVD::Rewritable.

Encapsulation

  1. Avoid non-const global variables.
  2. Minimize the scope of variables, pragmas, etc..
  3. Minimize the visibility of variables.
  4. Don't overload variables with multiple meanings.

Dependencies

When should you depend on another CPAN module rather than write your own code?

  1. Don't introduce dependencies lightly. Every module you add as a dependency is a module that can restrict your module -- if one of your module's dependencies is Linux-only, for example, then your module is now Linux-only; if another requires Perl 5.20+ so do you; if one of your dependencies has a bug, you also have that bug; if a new release of one of your dependencies fails, the likelihood of your release being unable to install increases; take care with dependencies having a different license to yours.
  2. Don't add developer convenience modules as a dependency. (As noted at Release::Checklist, there are two types of modules: functional modules, like DBI, and developer convenience modules, like Modern::Perl).
  3. It's usually best to use popular, quality CPAN modules in complex domains (e.g. DBI and XML) rather than roll your own. Doing so allows you to leverage the work of experts in fields that you are probably not expert in. Moreover, widely used CPAN modules tend to be robust and have fewer bugs than code you write yourself because they are tested by more users and in many different environments.
  4. For small and simple modules, on the other hand, such as slurping a file, you may prefer to roll your own code rather than pay the dependency cost of an external module.
  5. Cost vs Risk. Though using CPAN modules seems "free", there are hidden snags. What if your dependent module has a security vulnerability? What if the author abandons it? How quickly can you isolate/troubleshoot a bug in its code?
  6. Quality and Trust. Before introducing a dependency, it's worth checking CPAN ratings, Kwalitee score, bug counts, how quickly are bugs fixed etc. Does it contain gratuitous/unnecessary dependencies? (the ::Tiny CPAN modules were a reaction against modules that seemed to haul in half of CPAN as dependencies).
  7. Popularity. When you use a 3rd party module, you want it to be popular and widely supported; you want to be able to ask for advice on using it; you don't want it to die. Moreover, if your module depends on a very popular CPAN module, there's a good chance your module's users will already have it installed.

Comments and Documentation

  1. Prefer to make the code obvious.
  2. Don't belabour the obvious.
  3. Generally, comments should describe what and why, not how.
  4. Remove commented-out code, unless it helps to understand the code, then clearly mark it as such.
  5. Update comments when you change code.
  6. Include a comment block on every non-trivial method describing its purpose; each parameter should be documented as: input, output, or "inout" ("inout" parameters should be used sparingly).
  7. Major components should have a larger comment block describing their purpose, rationale, etc.
  8. There should be a comment on any code that is likely to look wrong or confusing to the next person reading the code.
  9. Every non-local named entity (function, variable, class, macro, struct, ...) should be commented.
  10. Separate user versus maintainer documentation.
  11. CPAN Module Documentation. Tutorial and Reference; Examples and Cookbook; Maintainer; How your module is different to similar ones; Change log; Notes re portability, configuration & environment, performance, dependencies, bugs, limits, caveats, diagnostics, bug reporting.

Portability

  1. Assume file names are case insensitive in that you cannot, in general, have two different files called fred and Fred.
  2. Source code file names should be all lower case.
  3. File names should only contain A-Z, a-z, 0-9, ".", "_", "-".
  4. Strive to structure code around "capabilities" rather than specific platforms. For example, "if HAVE_SHADOW_PASSWORDS" rather than "if SOLARIS_8". And define the capabilities in one place only (e.g. config.h).
  5. Organise the code so that machine dependent and machine independent code reside in separate files.
  6. Abstract hardware and external interfaces in a module and have all other code call that module and not call the hardware/external interface directly.
  7. As far as possible, write code to a widely supported portable standard (e.g. ANSI C, POSIX, ...) and only use machine specific facilities when absolutely necessary.
  8. Recognize and avoid non-portable constructs. For example, strive to avoid relying on ASCII character set, big v little-endian, 32-bit v 64-bit ints, pointer to int conversion, sign extension, and so on.

User Interfaces

  1. All command line tools should provide a usage option to explain the usage of the command. Always include at least one example in the usage description. You must at least support the -h option for help and optionally may support --help.
  2. Provide appropriate, clear feedback to the user of the progress of any long running operations and make them easy to cancel and safely rerunnable.

GUI User Interfaces

  1. Have clear objectives and guiding principles.
  2. Define personas; design to satisfy their goals.
  3. Adopt the user's perspective. Involve users in design. Perform usability tests. Give the user control. Make it configurable. Design iteratively.
  4. GUIs should reflect the user mental model, not the implementation model. Hide implementation details.
  5. Communicate actions to user. Provide feedback. Anticipate errors. Forgive errors. Offer warnings.
  6. Cater for both novice and expert. For novice: easy-to-learn, discoverable, tips, help. For expert: efficiency, flexibility, shortcuts, customizability. Optimize for intermediates.
  7. Keep interfaces simple, natural, consistent, attractive. Try to limit to seven simultaneous concepts.
  8. Use real-world metaphors.
  9. Ask forgiveness, not permission. Make all actions reversible.
  10. Eliminate excise.
  11. Be polite; remember what the user entered last time.
  12. Avoid dialog boxes as much as possible; don't use them to report normalcy.
  13. Provide "wizards" for complex procedural tasks.

Security

  1. Define security requirements as part of product requirements.
  2. Conduct security reviews (creating a threat model) if warranted by requirements.
  3. Know where to look for exploit notices and stay up-to-date.
  4. Use least privilege; only run with superuser privilege when you need to.
  5. When using fixed length buffers, ensure that any possible overflow is handled.
  6. Handle all errors (e.g. don't ignore error returns). Fail securely.
  7. Define secure defaults.
  8. Know how to call external components safely.
  9. Know how to handle insecure environment (e.g. environment variables, umask, inherited file descriptors, symbolic links, temporary files, child processes, ...).
  10. Validate insecure external data (e.g. input to program, parameters to an exported API).
  11. Beware of race conditions.
  12. Avoid canonical file paths and URLs.
  13. Use a security code review checklist.
  14. Use security code analysis tools.
  15. Minimize your attack surface.
  16. Know how to defend against common known attacks.
  17. Defend in depth.
  18. Don't tell the attacker anything.
  19. Don't mix code and data.
  20. Don't depend on security through obscurity.
  21. Heed compiler warnings.
  22. Architect and design for security policies. Keep it as simple as possible.
  23. Default deny. Base access decisions on permission not exclusion, by default deny access.
  24. Use effective QA: fuzz testing, penetration testing, source code audits.
  25. Adopt a secure coding standard.

See also: Re: Security techniques every programmer should know (Security References)

Internationalization (i18n) and Localization (L10n)

These domains warrant their own section. However, both these domains can be incredibly challenging. :-) For now, this is just a stub section.

  1. Define the product internationalization and localization goals.

Some Famous Programming Quotes

Code Reviews

The two principal types of code review are: formal and lightweight.

Formal code reviews require significant planning, training and resources; they are carried out in multiple phases by multiple participants playing various roles. The most well-known formal code review method is the Fagan inspection.

Lightweight code reviews, having less overhead than formal ones, are cheaper to conduct. The Best Kept Secrets of Peer Code Review book argues that lightweight code reviews are cheaper to perform than formal ones and can be just as effective. The four most common types of lightweight code review are:

  1. Over-the-shoulder.
  2. Email pass-around.
  3. Pair programming.
  4. Tool-assisted.

Lightweight Code Review Tips

Cited in Best Kept Secrets of Peer Code Review, the conclusions drawn from a lightweight code review case study at Cisco are:

  1. LOC under review should be less than 200 and not exceed 400. Larger LOCs tend to overwhelm reviewers.
  2. Total review time should be less than 60 minutes and not exceed 90. Defect detection rates plummet after 90 minutes.
  3. Inspection rates less than 300 LOC/hour result in best defect detection. Expect to miss a significant percentage of defects if faster than 500 LOC/hour.
  4. Expect defect rates of around 15 per hour.
  5. Authors who prepare for the review with annotations and explanations have far fewer defects than those that do not.

The Seven Deadly Sins of Software Reviews

According to Karl Wiegers, the Seven Deadly Sins of Software Reviews are:

  1. Participants don't understand the review process.
  2. Reviewers critique the producer, not the product.
  3. Reviews are not planned.
  4. Review meetings drift into problem-solving.
  5. Reviewers are not prepared.
  6. The wrong people participate.
  7. Reviewers focus on style, not substance.

Miscellaneous Code Review Tips

  1. Before the code review, push the code review through a tool that checks for straightforward layout and stylistic issues; this avoids wasting time on trivia during the review.
  2. Most of the code review work should be done before the code review meeting.
  3. The code review should be in writing.
  4. Have at least two code reviewers.

Tool-assisted Code Review

Guido van Rossum's first project at Google was Mondrian, a code review tool. Though he was unable to open source that work, he has since released the open source Rietveld code review tool. An exhaustive list of code review tools can be found at Survey of Code Review Tools.

In addition to tools that streamline the administrative side of the code review process, source code analysis tools can further be useful during code reviews. These tools fall into three broad categories:

  1. Code Formatters. e.g. Perl::Tidy.
  2. Static Code Analysers. e.g. Perl::Critic.
  3. Dynamic Code Analysers. e.g. valgrind.

Coding Standards References

Code Review References

References Added Later

More References Added Later

Updated 20 Feb: Added new "Design" and "Portability" sections. Updated 3-mar: Added new "Security" and "Internationalization and Localization" sections; added more General Guidelines; added more references. Updated Nov/Dec 2017: Added "OO Design" and "References Added Later" sections; extended "Comments" section to "Comments & Documentation". Aug 2020: added naming scheme for packages and classes. Added new Dependencies section from Writing Solid CPAN Modules. Mar 2021: Moved "Correctness, simplicity and clarity" to first bullet point (see salva's response below). Mar 2021: added "API Design Checklist" section, adapted from On Interfaces and APIs. June 2021: Comments and documentation section: added input, output, or "inout" when documenting function parameters (see Re^2: flower box comments).