Primarily, it is a skill that must be learnt, practiced, and maintained. And that requires becoming fully conversant, expert if you will, with the nuances and idioms of one's field, language and tools.
In the thread at Re^2: Why programming is so much more than writing code, the suggestion is made that source code comments are some magic bullet to code readability and maintainability. And that a lack of comments, and programmers who do not extensively comment their code are lazy, or bad, or otherwise detrimental to the world of programmer-kind. This is a refutation of that suggestion.
Source code comments are a double-edged sword. Once a piece of code has been commented, you not only have to maintain the code, you have to maintain the comments also. But that is not the end of it. You have to also ensure that the code and comments stay in synchronisation. That each truly reflects the other.
In a perfect world of no schedule or budget pressures, that is already a difficult thing to do. Whilst code is (mostly) unambiguous in it's interpretation, natural language prose, regardless of language used, isn't. Writing natural language descriptions of algorithms and logic and reasons and strategic goals and tactical expediencies is hard. Very hard. And keeping those natural language descriptions in sync with the evolutions of the code, not to mention the ever changing wider project and other political realities, is much, much harder.
One of the high priority goals in programming is the removal/avoidance of codependencies. We avoid using parallel data structures (eg.parallel arrays), because it becomes a nightmare to maintain those parallel arrays in synchronisation as algorithms and projects evolve. One of the key attributes of well designed classes (and other abstractions), is that they are as fully independent of their peers as possible. As decoupled as is possible to achieve.
By relying upon both code and comments to describe our algorithms, we are purposely coupling those two and creating a dependence upon that coupling; with all the caveats and consequences that entails.
Prior art and learning from (others) experience.
Mathematicians, physicists, chemists and engineers go out of their way to avoid textual descriptions of their theorems and designs. Their nomenclatures have evolved over hundreds of years through hard won, practical experience. Much greater time periods than the computer industry, and computer code has existed. In every case, they have evolved not because someone decided that it would be a 'good idea'; or as a form of protectionism al la the use of Latin in religious ceremonies, legal work and matters of state; nor because they are too lazy to write their technical descriptions out in full in 'proper English' (French, German, Italian etc.).
Natural language is simply too costly to construct; too ambiguous to read; too time consuming to maintain. It is also too restrictive in it's audience. A fully and precisely formulated natural language description of anything in any natural language, even of simple things like the way the moon looks, or how to make tea or coffee, is entirely impenetrable to all but a few, gifted native exponents of that particular language. If you doubt this, try reading poetry in a language other than your own, and having a discussion about the authors meaning with a native speaker of that language.
That means most people, including all those gifted exponents of the subject who's native language is not that of the author; have no possibility of understanding the full nuances of the authors prose. The nomenclatures that these scientists have evolved, by necessity and through experience over hundreds of years, are explicitly designed to avoid the minefields of 'rules for proper construction'; the validity and effectiveness of 'artistic licence'; and the ambiguities of interpretation, that inevitably arise from the use of natural language.
These nomenclatures evolved through practical necessity. and hard-won experience. Whether mathematical notations, chemical symbols and formulae, or engineering drawings, their purpose is to be transparent in their interpretation regardless of the readers place of birth or natural language skills. They are designed to allow an engineer in China to produce a widget from a drawing produced in the USA. Or a mathematician in Russia to continue a meaningful and detailed dialogue with a colleague in France or Australia or New Guinea. The nomenclatures are unambiguous and precise, but they do have a learning curve. But that learning curve is less than that for the natural language of a single other correspondent. And much less than that of all the other correspondents one might come into contact with.
And scientists are not the only ones to have seen the advantages of using a non-natural language for communications. Whether it is choreographers, geographers or a host of others. All have evolved notations to describe their work/art. And all for the same reasons. Natural language, as powerful as it is, is unsuited to use for describing anything precisely. Try and get any two language professors to agree on their interpretation of the works of Shelley, Byron or Shakespeare to see what I mean. Or perhaps a more familiar example will make the point musicians.
Annotations have their place
That's not to say that there isn't some room for natural language comments in code. There are times and occasions when one or two words can clarify an idea more easily and quickly than anything else, But, just as the composer will sometimes annotate a score with a word or two, Allegro; Allegretto, non troppo, Maestoso, poco a poco, Lebhaft etc., so comments in code should be ancillary notations, used sparingly where they clarify things that the notation (be-it Perl, C or any other) fail to convey.
Just as mathematics, music or engineering notations require the proponent to become fully conversant with them to be able to read or write them, so it is with computer languages. It is possible to 'make music' whilst only understanding the white notes, but you are never going to be able to write, or play 'good music', without you understanding about sharps and flats (and all that other music stuff I do not understand:). So it is with coding. You can start to write programs with only understanding the basic constructs of a programming language--loops, conditional statements, subroutines and classes--but in order to be able to write code that goes anyway beyond the mearly functional you need to understand more.
To write maintainable, effective, efficient (in terms of both construction and operation), reusable code, it is necessary to become familiar with the nuances and idioms of the language you chose to use. And to be competent to maintain the code of others, this is doubly true. To complain that the previous author did not document his code well enough, because you haven't expended sufficient time and effort in understanding the nuances and idioms of the language it is written in is a cop-out of the highest order.
Programming, especially maintenance programming is hard
Programming is hard, perhaps one of the hardest skills a human being can learn to do. Doing it well requires dedication and effort. To require that the programmers of any code you wish to maintain describe their algorithms not once but twice. The second time in an wordy, ambiguous, imprecise and unsuited natural language that you happen to be familiar with, is like asking Beethoven to explain in words, on the score, why he chose to make the next note sharp instead of natural, or Vincent van Gogh to explain, in words on the canvas, why the perspective of the vase in his Sunflowers painting is all wrong.
Comments are not, and should not be, documentation
Documentation of code is vitally important for successful, ongoing projects, but comments are not documentation, and it has no place in source files. Comments, when applicable, are annotations. Their purpose is to clarify the code, not explain it. They are ancillary hints as to the function or reasoning behind a particular construct to which they are attached. They are not substitutes for understanding of the problem to be solved or knowledge of language being used.
Programmers should not expect to be able to open a source file at random, move to some arbitrary point within that file and then instantly be able to modify the code there from just a cursory glance at the code and comments in the locality. Coding is hard and programs are complex. And maintaining programs and code written by others is even harder. Code maintenance is often seen as a secondary activity, somehow lesser than 'greenfield' coding. It is not!
In most cases of consequence, code spends much more of it's life in maintenance than it does in development, and it is about time that corporations and individuals recognised that fact. It is also the case that if a green field coder encounters a problem or bug in their latest code, they have the option to discard it and start again. That is rarely if ever the case for the maintenance programmer. The green field site architect can choose a vision and design his building in concert with that vision (and his clients aspirations), but the brown field architect is much more constrained. He not only has to design a building that meets his clients requirements and fit the shape of the location, he also has to consider a whole raft of other constraints. Does the appearance of the building fit with those around it. What affect will his building have upon the traffic flows, wind flows, views, light and aesthetics of the existing buildings around it. So it is for the maintenance programmer. Not only must his code function for the purpose at hand. It must also be 'in keeping' with the code around it and to which it will interface, and that which will interface with it. And he will usually be under severe time and budgetary pressures. Not to mention the usually lower grade of support and expectations that result from the twilight status often afforded the maintenance programmers role.
So, the maintenance programmers job is hard, often much harder than the green field programmers role, but the answer is not in comments. Extensive or verbose commenting will not solve this problem. They serve only to obscure it.
The cons of extensive commenting
- They add time and cost to development.
Writing good comments is a skill that fails to come naturally to most programmers, and seems to be near impossible to learn or teach.
- They add time and cost to maintenance role.
Maintaining comments in parallel to code is more than a duplication of effort. Synchronisation is hard and in every other area of coding this is recognised and avoided wherever possible,
- They are ambiguous.
No two people--not even congenital twins, nor 'equal' experts in a given field--will ever derive exactly the same meaning and information from a natural description of anything complex.
- They clutter source files.
Imagine trying to work on the internals of your car engine, if the pages of the maintenance manual were inscribed on, or interleaved between the components.
Another analogy is trying to navigate your way from one side of London (or any big city), to the other, using one of those little A5-sized A-Z maps. Your route is so obscured in the local detail that you get no overview of it.
Imagine trying to understand, much less enjoy, a Shakespeare play, if every sentence or phrase was interspersed with an interpretive narration by some third party.
- They are often, perhaps even usually, misleading.
It's a fact of life, good intentions and edicts not withstanding, comments are rarely, if ever, maintained with the same level of effort and diligence that the code receives.
Perhaps more importantly, comments are rarely reviewed, and by definition, never tested.
- People rarely, if ever, confine their comments to those matters that are unique to this piece of code in this file in this project.
Perhaps the author of the code takes the time to describe how items to be compared are passed to the comparison block in the magical global variables $a & $b. That's fine, but you already know that, so it isn't helpful. However, he fails to mention the relative precedence of or versus ||, and the difference in the function of those two, as compared to |. But it is this latter factor of Perl that is missing from your personal knowledge of that language and it is this bit that causes you to incorrectly modify the routine.
Extensive comments on the ambiguities, inconsistencies and quirks of a particular language or a particular implementation have no place in any given source file. They should be (and in Perl's case, invariably are), described, once definitively, in the language or implementation documentation. Re-describing, or more usually, badly paraphrasing, that documentation in every source file that uses that particular facility or quirk, is redundant and dangerous.
If the implementation changes and the ambiguity or inconsistency is resolved, who is going to go back over all those source files and fix them?
Comments serve a different purpose to documentation. And documentation does not belong in source code files.
Comments should be used sparingly; be terse in construction; and wholy confined to describing ambiguous elements of the code to which they are (closely) attached and which can not be clarified further using the features of the language itself.
So, do not expend your efforts on using comments to explain your code. Put the effort into making the code as clear as possible. But, and as buts go, this is a big one ...
No programmer can be expected to make up for deficiencies in the knowledge (of the language; or algorithm; or project; or any environmental factors), of those that follow him or her, by predicting what they may not know, and fixing it by paraphrasing the documentation, in source code comments.
It is impossible. And to expect that of those that precede you, says something about you, not them.