I've been asked to prepare some guidelines on coding standards
and code reviews in general, across a number of languages used at work,
not just Perl.
After doing a bit of basic research, I've cobbled together the notes below.
To gain some feedback, I'm posting an early draft here.
Note that language-specific coding standards are not covered here;
these will be covered in separate coding standards documents, one for each language.
Most of the coding guidelines below were not invented by me, but derived from
hopefully well-respected sources; see the Coding Standards References section
below for details.
These are some general attributes you should strive for in your code.
Remember that code maintainability is paramount.
- Robustness, Efficiency, Maintainability.
- Scalability, Abstraction, Encapsulation.
- Uniformity in the right dimension, creativity in dimensions that matter.
- Correctness, simplicity and clarity come first. Avoid unnecessary cleverness.
- If you must rely on cleverness, encapsulate and comment it.
- DRY (Don't repeat yourself). Duplication exists in data as well as code.
- Prefer to find errors at compile time rather than run time.
- Establish a rational error handling policy and follow it strictly.
- Throw exceptions instead of returning special values or setting flags.
- Know when and how to code for concurrency.
- Use a single-step automated build system.
- Practice releasing regularly.
- Plan to evolve the code over time.
- Invest in code reviews.
- Consider the code from the perspective of: usability, simplicity, declarativeness, expressiveness, regularity, learnability, extensibility, customizability, testability, supportability, portability, efficiency, scalability, maintainability, interoperability, robustness, type safety, thread-safety/reentrancy, exception-safety, security. Update: a few more: correctness, reliability, reusability, productivity/timeliness, documentation/discoverability. Resolve any conflicts between perspectives based on requirements.
- Agree upon a coherent layout style and automate it.
- When in doubt, or when the choice is arbitrary, follow the common standard practice or idiom.
- Don't optimize prematurely. Benchmark before you optimize. Comment why you are optimizing.
- Don't pessimize prematurely.
- Don't reinvent the wheel. If there is a library method that implements the functionality you need, use it.
- Be const correct.
- Adopt a policy of zero tolerance for warnings and errors. This principally means compiling cleanly at high warning levels, but is broader than that; for example, tools such as checked STL implementations, static code analysers (e.g. Perl::Critic, FxCop), dynamic code analysers (e.g. valgrind, Purify), unit tests (e.g. Test::More, NUnit), code that emits spurious warnings/errors in customer log files, and so on. Third party files may be exempt from this policy.
- Use a revision control system.
- Write the test cases before the code. When refactoring old code (with no unit tests), write unit tests before you refactor.
- Add new test cases before you start debugging.
- Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live.
Design and Architecture
- Know Architectural patterns. Choose wisely. Multitier architecture is common.
- Coupling and Cohesion. Systems should be designed as a set of cohesive modules as loosely coupled as is reasonably feasible.
- Testability. Systems should be designed so that components can be easily tested in isolation.
- Information hiding: Minimize the exposure of implementation details; provide stable interfaces to protect the remainder of the program from the details of the implementation (which are likely to change). Don't just provide full access to the data used in the implementation. Minimize the use of global data. Avoid Action at a distance.
- Interfaces matter. Once an interface becomes widely used, changing it becomes practically impossible (just about anything else can be fixed in a later release).
- Design the module's interface first.
- Design interfaces that are: consistent; easy to use correctly; hard to use incorrectly; easy to read, maintain and extend; clearly documented; appropriate to your audience. Be sufficient, not complete; it is easier to add a new feature than to remove a mis-feature.
- Favour Composition Over Inheritance.
- SOLID: Single-responsibility (a class should have only one responsibility), Open-closed principle (open for extension, closed for modification), Liskov substitution principle (derived classes must be substitutable for their derived classes), Interface segregation principle (clients should not be dependent on interfaces they do not use), Dependency inversion principle (program to an interface, not an implementation).
- GRASP: General Responsibility Assignment Software Patterns (or Principles). Guidelines for assigning responsibility to classes and objects in object-oriented design. The different patterns and principles used in GRASP are controller, creator, indirection, information expert, high cohesion, low coupling, polymorphism, protected variations, and pure fabrication.
- Know Software Design Patterns. Choose wisely.
- Dependency Injection. Especially useful to improve testability.
- Define Class invariants.
- Immutability. Prevent inappropriate access by making your objects immutable: provide the data to their constructors, then disallow any modifications of this information thereafter. Note that Java strings are immutable. Immutability can also be useful for thread safety and in functional programming (see Pure function).
- Follow the de facto standard set by the code you are editing. Or change the entire source file to the new standard.
- Remove unused code.
- Assert liberally. Asserts should be used to document the constraints for a piece of code.
- Log liberally. Strive to log enough information to trouble-shoot a customer problem without the need to attach a debugger.
- The result of every file operation or API call or external command must be checked, and unexpected results handled.
- Any unexpected result from a file operation or API call or external command should be logged.
- Avoid magic numbers. Note that 0 and 1 are ok and are not considered magical.
- Limit and explicitly comment case "fall throughs".
- Avoid side effects. e.g. inside macros.
Layout rules will vary between organisations. These sort of arbitrary code
layout rules should be enforced by a tool (e.g. Perl::Tidy).
- Use spaces not TABs.
- Three character indent (four is more common; get agreement and enforce with a tool).
- No long lines. Limit the line length to a maximum of 120 characters.
- No trailing whitespace on any line.
- Put brace on a new line.
- Single space around keywords, e.g. if (.
- Single space around binary operators, e.g. 42 + 69
- Single space after comment marker, e.g. "// fred" not "//fred", "# fred" not "#fred"
- No space around unary operators, e.g. ++i
- No space before parens with functions/macros, e.g. fred( 42, 69 )
- Single space after parens with functions/macros, e.g. fred( 42, 69 )
- Single space after comma with functions/macros, e.g. fred( 42, 69 )
- Layout lists with one item per line; this makes it easier to see changes in version control.
- One declaration per line.
- Function calls with more than two arguments should have the arguments aligned vertically.
- Avoid big-arse functions and methods. Ditto for large classes and large files.
- Avoid deep nesting.
- Always use braces with if statements, while loops, etc. This makes changes shorter and clearer in version control.
- Use descriptive, explanatory, consistent and regular names.
- Favour readability over brevity.
- Avoid identifiers that conflict with keywords.
- Short names for short scopes, longer names for longer scopes.
- Avoid ambiguous words in names (e.g. Don't abbreviate "Number" to "No"; "Num" is a better choice).
- For packages and classes a good naming scheme is: Abstract_noun, Abstract_noun::Adjective, Abstract_noun::Adjective1::Adjective2. For example: Disk; Disk::Audio; Disk::DVD; Disk::DVD::Rewritable.
- Avoid non-const global variables.
- Minimize the scope of variables, pragmas, etc..
- Minimize the visibility of variables.
- Don't overload variables with multiple meanings.
When should you depend on another CPAN module rather than write your own code?
- Don't introduce dependencies lightly. Every module you add as a dependency is a module that can restrict your module -- if one of your module's dependencies is Linux-only, for example,
then your module is now Linux-only; if another requires Perl 5.20+ so do you; if one of your dependencies has a bug, you also have that bug; if a new release of one of your dependencies fails, the likelihood of your release being unable to install increases; take care with dependencies having a different license to yours.
- Don't add developer convenience modules as a dependency. (As noted at Release::Checklist, there are two types of modules:
functional modules, like DBI, and developer convenience modules, like Modern::Perl).
- It's usually best to use popular, quality CPAN modules in complex domains (e.g. DBI and XML) rather than roll your own. Doing so allows you to leverage the work of experts in fields that you are probably not expert in. Moreover, widely used CPAN modules tend to be robust and have fewer bugs than code you write yourself because they are tested by more users and in many different environments.
- For small and simple modules, on the other hand, such as slurping a file, you may prefer to roll your own code rather than pay the dependency cost of an external module.
- Cost vs Risk. Though using CPAN modules seems "free", there are hidden snags. What if your dependent module has a security vulnerability? What if the author abandons it? How quickly can you isolate/troubleshoot a bug in its code?
- Quality and Trust. Before introducing a dependency, it's worth checking CPAN ratings, Kwalitee score, bug counts, how quickly are bugs fixed etc. Does it contain gratuitous/unnecessary dependencies? (the ::Tiny CPAN modules were a reaction against modules that seemed to haul in half of CPAN as dependencies).
- Popularity. When you use a 3rd party module, you want it to be popular and widely supported; you want to be able to ask for advice on using it; you don't want it to die. Moreover, if your module depends on a very popular CPAN module, there's a good chance your module's users will already have it installed.
Comments and Documentation
- Prefer to make the code obvious.
- Don't belabour the obvious.
- Generally, comments should describe what and why, not how.
- Remove commented-out code, unless it helps to understand the code, then clearly mark it as such.
- Update comments when you change code.
- Include a comment block on every non-trivial method describing its purpose.
- Major components should have a larger comment block describing their purpose, rationale, etc.
- There should be a comment on any code that is likely to look wrong or confusing to the next person reading the code.
- Every non-local named entity (function, variable, class, macro, struct, ...) should be commented.
- Separate user versus maintainer documentation.
- CPAN Module Documentation. Tutorial and Reference; Examples and Cookbook; Maintainer; How your module is different to similar ones; Change log; Notes re portability, configuration & environment, performance, dependencies, bugs, limits, caveats, diagnostics, bug reporting.
- Assume file names are case insensitive in that you cannot, in general, have two different files called fred and Fred.
- Source code file names should be all lower case.
- File names should only contain A-Z, a-z, 0-9, ".", "_", "-".
- Strive to structure code around "capabilities" rather than specific platforms. For example, "if HAVE_SHADOW_PASSWORDS" rather than "if SOLARIS_8". And define the capabilities in one place only (e.g. config.h).
- Organise the code so that machine dependent and machine independent code reside in separate files.
- Abstract hardware and external interfaces in a module and have all other code call that module and not call the hardware/external interface directly.
- As far as possible, write code to a widely supported portable standard (e.g. ANSI C, POSIX, ...) and only use machine specific facilities when absolutely necessary.
- Recognize and avoid non-portable constructs. For example, strive to avoid relying on ASCII character set, big v little-endian, 32-bit v 64-bit ints, pointer to int conversion, sign extension, and so on.
- All command line tools should provide a usage option to explain the usage of the command. Always include at least one example in the usage description. You must at least support the -h option for help and optionally may support --help.
- Provide appropriate, clear feedback to the user of the progress of any long running operations and make them easy to cancel and safely rerunnable.
GUI User Interfaces
- Have clear objectives and guiding principles.
- Define personas; design to satisfy their goals.
- Adopt the user's perspective. Involve users in design. Perform usability tests. Give the user control. Make it configurable. Design iteratively.
- GUIs should reflect the user mental model, not the implementation model. Hide implementation details.
- Communicate actions to user. Provide feedback. Anticipate errors. Forgive errors. Offer warnings.
- Cater for both novice and expert. For novice: easy-to-learn, discoverable, tips, help. For expert: efficiency, flexibility, shortcuts, customizability. Optimize for intermediates.
- Keep interfaces simple, natural, consistent, attractive. Try to limit to seven simultaneous concepts.
- Use real-world metaphors.
- Ask forgiveness, not permission. Make all actions reversible.
- Eliminate excise.
- Be polite; remember what the user entered last time.
- Avoid dialog boxes as much as possible; don't use them to report normalcy.
- Provide "wizards" for complex procedural tasks.
- Define security requirements as part of product requirements.
- Conduct security reviews (creating a threat model) if warranted by requirements.
- Know where to look for exploit notices and stay up-to-date.
- Use least privilege; only run with superuser privilege when you need to.
- When using fixed length buffers, ensure that any possible overflow is handled.
- Handle all errors (e.g. don't ignore error returns). Fail securely.
- Define secure defaults.
- Know how to call external components safely.
- Know how to handle insecure environment (e.g. environment variables, umask, inherited file descriptors, symbolic links, temporary files, child processes, ...).
- Validate insecure external data (e.g. input to program, parameters to an exported API).
- Beware of race conditions.
- Avoid canonical file paths and URLs.
- Use a security code review checklist.
- Use security code analysis tools.
- Minimize your attack surface.
- Know how to defend against common known attacks.
- Defend in depth.
- Don't tell the attacker anything.
- Don't mix code and data.
- Don't depend on security through obscurity.
- Heed compiler warnings.
- Architect and design for security policies. Keep it as simple as possible.
- Default deny. Base access decisions on permission not exclusion, by default deny access.
- Use effective QA: fuzz testing, penetration testing, source code audits.
- Adopt a secure coding standard.
Internationalization (i18n) and Localization (L10n)
These domains warrant their own section. However, both these domains can be
incredibly challenging. :-) For now, this is just a stub section.
- Define the product internationalization and localization goals.
Some Famous Programming Quotes
The two principal types of code review are: formal and lightweight.
Formal code reviews require significant planning, training and resources; they are carried out in multiple phases by multiple participants playing various roles. The most well-known formal code review method is the Fagan inspection.
Lightweight code reviews, having less overhead than formal ones, are cheaper to conduct.
The Best Kept Secrets of Peer Code Review book argues that lightweight code reviews are cheaper to perform than formal ones and can be just as effective.
The four most common types of lightweight code review are:
- Email pass-around.
- Pair programming.
Lightweight Code Review Tips
Cited in Best Kept Secrets of Peer Code Review, the conclusions drawn from a lightweight code review case study at Cisco are:
- LOC under review should be less than 200 and not exceed 400. Larger LOCs tend to overwhelm reviewers.
- Total review time should be less than 60 minutes and not exceed 90. Defect detection rates plummet after 90 minutes.
- Inspection rates less than 300 LOC/hour result in best defect detection. Expect to miss a significant percentage of defects if faster than 500 LOC/hour.
- Expect defect rates of around 15 per hour.
- Authors who prepare for the review with annotations and explanations have far fewer defects than those that do not.
The Seven Deadly Sins of Software Reviews
According to Karl Wiegers, the Seven Deadly Sins of Software Reviews are:
- Participants don't understand the review process.
- Reviewers critique the producer, not the product.
- Reviews are not planned.
- Review meetings drift into problem-solving.
- Reviewers are not prepared.
- The wrong people participate.
- Reviewers focus on style, not substance.
Miscellaneous Code Review Tips
- Before the code review, push the code review through a tool that checks for straightforward layout and stylistic issues; this avoids wasting time on trivia during the review.
- Most of the code review work should be done before the code review meeting.
- The code review should be in writing.
- Have at least two code reviewers.
Tool-assisted Code Review
Guido van Rossum's first project at Google was Mondrian, a code review tool.
Though he was unable to open source that work, he has since released the open source
Rietveld code review tool.
An exhaustive list of code review tools can be found at Survey of Code Review Tools.
In addition to tools that streamline the administrative side of the code review process, source code analysis tools can further be useful during code reviews. These tools fall into three broad categories:
- Code Formatters. e.g. Perl::Tidy.
- Static Code Analysers. e.g. Perl::Critic.
- Dynamic Code Analysers. e.g. valgrind.
Coding Standards References
Code Review References
References Added Later
Agile Programming Languages
So after a month of research into "agile languages", I must conclude there is no such thing. Barbarian coders are constantly looking to increase their speed to build the next new thing. Java developers are frustrated with their language's bureaucracy. Students entering the workforce are disappointed when a programming career is less fun than they imagined. None of these things relate to agile development. I'm not convinced a language could be built that would be a significant advantage to agile development. Software development methods are mainly concerned with people, while programming is about crafting code. They are really two separate problem domains. Anyone claiming a language is agile is trying to deceive you with hype.
-- What is an Agile Language? by Ken O. Burtch
The agile manifesto defines agile development:
- Individuals and interactions over processes and tools
- Working software over comprehensive documentation
- Customer collaboration over contract negotiation
- Responding to change over following a plan
To be anti-agile therefore, a programming language needs to:
- Inhibit interaction between individuals and demand business process
- Prevent working software while forcing the writing of unhelpful docs
- Block collaboration with customers and refuse to function without a legal contract
- Make code changes impossible without compliance to a project manager's plan
Since it's hard to imagine such a language, anyone claiming their favourite language is "agile" is probably trying to deceive you with hype.
More References Added Later
- Is it correct? by GrandFather (Does it work correctly? Do you understand it? Would anyone else understand it? Will you understand it in a month's time? Does it strike a good balance between terseness and verbosity? Could you make changes to it without it being likely to break in unforeseen ways? Is it fast enough?)
Updated 20 Feb: Added new "Design" and "Portability" sections. Updated 3-mar: Added new "Security" and "Internationalization and Localization" sections; added more General Guidelines; added more references. Updated Nov/Dec 2017: Added "OO Design" and "References Added Later" sections; extended "Comments" section to "Comments & Documentation". Aug 2020: added naming scheme for packages and classes. Added new Dependencies section from Writing Solid CPAN Modules. Jan 2021: Added Agile Programming Languages section.