Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Code review on script site

by Jazz (Curate)
on Nov 24, 2001 at 04:46 UTC ( [id://127201]=perlmeditation: print w/replies, xml ) Need Help??

Acting on the advice from this node, we have begun the planning process for basic code reviews on scripts listed at the Perl Archive. This code review will aid in categorizing scripts based on its security and basic programming practices. I've used this node as a reference when creating the basic points for code review.

Unless otherwise noted, each script will receive 1 point for compliance, -1 point for non-compliance on each of the following:

  1. Uses warnings
  2. Uses strict
  3. Security (up to 4 points).
    • Uses -T
    • Implements valid checks on all user input for potential security breaches or other damage
    • Does not appear to allow arbitrary commands
    • Using $CGI::POST_MAX or otherwise limiting maximum post size (thanks crazyinsomniac)
  4. Html output - uses CGI, HTML::Template, HTML::Mason, or other suitable alternative
  5. Form parsing - uses CGI, CGI::Lite, or other suitable module-derived alternative
  6. Uses modules where applicable (-1 for using cgi-lib.pl)
  7. Style, based on clarity and modularity (up to 2 points)
  8. Documentation / comments (-2 for no comments; +2 for effective use of commenting)
  9. Use HERE docs for lengthy text (-2 points for multiple print statements and "\"escape syndrome\"")
  10. Checks return value of specific functions (aside from open, close, flock, can you suggest others to be added to this function list?)
  11. Preserve file integrity by correctly using flock when necessary.
  12. Anything else?

Since there will probably be instances when one or more of the points above will not be applicable to a script, a 0 point value will be used. This will equal N/A and will not affect the total score.

I realize that this does not come anywhere close to a comprehensive code review, but the only way we can realisticly implement any sort of code review at all is if we keep it simple (after all, there are ~4k scripts to review). Even this paltry review process can guilt/embarass even a few programmers into revamping their scripts, it will be worth the effort.

It's only fair that the program authors should bear the cost for more intensive code reviews on their own programs. If a program's author wishes to have an in-depth, individualized code review, we will refer them to various programmers who have expressed interest in performing this service (some for a fee, some as volunteers). The reviewer will then let me know the point score of the script. Perhaps there may be some rekindled interest in a code review section here?

Once a script has been reviewed, it will have a "detail" page on the site with the results of the review.

Any suggestions, enhancements, or critiques you can offer on this list would be very helpful :)

Jasmine

Update: Tainting/security point updated based on two replies from wog (1, 2) and a /msg from crazyinsomniac.

Update: Changed "excessive commenting" to "effective use of commenting", based on rchiav's suggestion.

Update: By monk magic, I presume, this node has been relocated to Meditations, where I'm able to edit the root node (thanks!). So, the updated list is back here.

Replies are listed 'Best First'.
(Ovid) Re: Code review on script site
by Ovid (Cardinal) on Nov 24, 2001 at 05:49 UTC

    This sounds great. A few minor points, though. (I hope they don't sound too nitpicky)

    • Uses tainting when working with server files (-3 for opening file based on tainted form input)

      Actually, I would check this based upon the needs of a script. For example, many people use form data to build SQL. It's trivial to munge form data to wipe out a poorly-designed database, so that would also merit a -3. However, if they're just taking data and spitting back to a Web page, that might not be so bad (assuming that it's a one time page and not something that would open up cross-site scripting holes).

    • Uses CGI.pm for html

      Ignoring the issue of templates, I can see some people making a case for HERE documents. I don't like 'em, but would you going to take points off of some of KM's scripts from his book that use HERE docs? :) I'd take points off if they use multiple prints instead of a HERE doc.

    • Uses CGI.pm for form parsing

      What about CGI::Lite? If the author has a reasonable alternative, I wouldn't ding them for not using CGI.pm. Of course, I'd probably take a buzz-saw to their code if they hand-roll it since these are invariably broken.

    Here's a personal pet peeve: failure to check return value of functions. Not all functions, mind you. When was the last time you saw someone check the return value of print? However, forgetting to check the return value of an open or a flock could be disastrous.

    I would also be concerned about how someone opens files. If they don't flock when they should, or if they don't flock correctly and risk a race condition, that would be a concern.

    I'll post an update if I think of anything else off of the top of my head.

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

      Ovid, not nitpicky at all. Excellent clarifications. I would think that the HERE docs and "\"escape syndrome\"" should be a separate point? Do you agree?

      Thanks :)
      Jasmine

      Update: Updated list is now being maintained in the root node.

        If they are scripts, I would +1 if the comments include examples of standard ways one would expect to call the script (command lines and sample output). This is really helpful in picking up a script.

        I can put my money where my mouth is. I have uploaded a couple of script to the Code Catacombs which do just that. I won't link to them directly, but search for pinger and nugid for a couple of examples.

        <update date="2001-11-26"> Here's a thing that is a definite no-no worth a -1: "gratuitous use of & when calling a subroutine" (e.g. &func($x,$y)) except if @_ is to be passed to the called routine... which is hardly ever the case. I find this to be pretty cargo-cultish, not to mention noisy.</update>

        --
        g r i n d e r
Re: Code review on script site
by chromatic (Archbishop) on Nov 24, 2001 at 11:08 UTC
    I've been thinking about this as well, and still have more to do. I'll probably update this, but here's one thought for now.

    "Using modules where appropriate" is one thing, but it can be taken too far. A lot of these programs are designed to be installed on rather broken Perl installations. That is, good luck trying to get Date::Manip installed. Or Perl 5.5.3. When they finally get 5.8, they'll realize that our fearless Borg pumpking has assimilated many useful things, but that could be another several years.

    I'd dock points for reinventing wheels provided by the language or in the core library of reasonably recent versions (5.4 is reasonable). qx(date) is one thing, but I've seen programs that write files and then shell out to system grep several times.

    Color me sympathetic to the people stuck on systems adminstered by the slightly-less-than legitimate operator from heck (incompetent, not evil), but I have very little mercy for people who think they write code worth sharing without even having skimmed perlfunc.

Re: Code review on script site
by wog (Curate) on Nov 24, 2001 at 06:21 UTC
    I'd reccommend adding:

    1. Security flaws allowing arbitrary code execution, arbitrary file access, or arbitrary sending of many, many e-mails in a short period of time with no way of tracing origin: either removal from listing until problems are fixed, or marked with "Serious Security Flaw" on the page where it's listed (not just the "detail" page).
    2. Is portable. Exceptions given where script's function is inherently platform-specific. ( -1 if not portable to both Windows and UNIX, +1 for use of File::Spec instead of hard-coding "/", /\A.{1,2}\z/, etc. )
      wog, thanks for the suggestions. A couple of questions/comments, if I may?

      Security flaws allowing arbitrary code execution, arbitrary file access, or arbitrary sending of many, many e-mails in a short period of time with no way of tracing origin: either removal from listing until problems are fixed, or marked with "Serious Security Flaw" on the page where it's listed (not just the "detail" page).

      This seems related to the tainting point. Should that point be reworded to something like: "Security. Implements valid checks on all user input for potential security breaches or other damage; prohibits arbitrary commands."?

      Is portable. Exceptions given where script's function is inherently platform-specific. ( -1 if not portable to both Windows and UNIX, +1 for use of File::Spec instead of hard-coding "/", /\A.{1,2}\z/, etc. )

      Hmmm. Submitted scripts (usually) list the platforms that the script has been tested with/developed for. Does anyone second the motion of point loss if it's not cross-platform?

      Jasmine

        Security. Implements valid checks on all user input for potential security breaches or other damage; prohibits arbitrary commands.

        I would, minimally, mention something to the effect of not being an easy gateway for spammers. Also, I would advise against saying "prohibits"; try something which expresses that your reviewers cannot find all possible holes in the script, like "does not appear to allow".

Re: Code review on script site
by rchiav (Deacon) on Nov 24, 2001 at 09:08 UTC
    Great idea. Couple comments though:

    You mentioned +2 for excessive comments. I'd personally -- for excessive comments. Excessive means that it's more than is needed, which to me means something like:

    #loops through all the lines in a #file and prints them out foreach $line (@file) { print $line; }
    I think the loop is self commenting enough (not that you'd ever use that). It bothers me when people over comment. You spend more time reading about what each line does than you do reading the program. Maybe something subjective like "effective use of commenting"?

    Also, I think beyond the cut and dry "points", there should be a human critique on the style, flow, and as always, what could be better.

    Just my nickle.
    Rich

      rchiav, thank you for the comment revision -- the outline has been updated.

      Also, I think beyond the cut and dry "points", there should be a human critique on the style, flow, and as always, what could be better.

      I definitely agree there, but believe that such a review should be the responsibility of the author to request or hire someone to do.

      The author must take the incentive to build on the basic review we're going to do at no cost for them (but at a cost to us). They'll need to care enough about their own code, reputation, and skill to pursue (and learn from) a thorough code review.

      As much as we may like to, we can't do it all for them :)

Re: Code review on script site
by demerphq (Chancellor) on Nov 24, 2001 at 20:59 UTC
    Regarding the style issue, I think you need to be careful. As has been pointed out in the subthread starting at Re: Re: Code review style is matter of opinion, often passionate, but at the same time has important side effects. Nevertheless the existance of a powerful formating tool such as perltidy makes some of these points redundant.

    Personally I would would come up with a standard configuration for perltidy and then run all scripts through it prior to evaluation, and also publishing. This ensures that all scripts are of a uniform indentation and formatting style. Which leaves stylistic evaluation the more import issues such as as the use of obfu'd constructs, map in void context, stringification habits, commenting habbits, variable naming but exclude the more trivial (on the low level of correctness that is) and personal issues such as brace positioning and indentation size.

    Yves / DeMerphq
    --
    Have you registered your Name Space?

Re (tilly) 1: Code review on script site
by tilly (Archbishop) on Nov 26, 2001 at 08:47 UTC
    I dislike this list.

    It misses too many important things. And takes too many picky positions on irrelevant details. For instance take the HERE doc suggestion. Personally I don't use HERE docs because I find that synchronizing them against my own indentation tends to be too fragile. Instead I use qq(). Am I therefore going to be docked?

    Other things are inappropriate for scripts. Using templating modules is great for websites, but for a script it adds a big dependency, forces you to break out and manage several files, and gives little useful return. As I like to say, every good thing you can do has a corresponding cost, and it is the job of good programmers to decide the tradeoffs. I remain to be convinced that adding that dependency is typically a good choice for a stand-alone script. And I am likewise unconvinced that adding the indirection of CGI HTML generation is good for code that you hope someone will learn from.

    And of course there is the commenting question. Well good programmers disagree strongly on whether or not to comment, let alone how to comment appropriately. That is a mine-field I would recommend leaving well alone.

    So what would a better list be? Well if I was trying to judge a stand-alone CGI script, my list of things to look at would go something like this:

    1. Let's start with the externals. Does it have documentation?
    2. Does the documentation accurately say what it does?
    3. Does it say how to install it? (Including what all dependencies are.)
    4. Does it say how to customize it? (Effective comments in code are sufficient customization instructions.)
    5. Does it work?
    6. Now we move to code quality. Does the code have a consistent and somewhat reasonable indentation style? (People may think this is minor, just run perltidy. I think that anyone who doesn't know to do the obvious basics is unlikely to know how to do anything else right.)
    7. Is their code broken up into managable chunks? A good rule of thumb is that any function or straight-line piece should be 50 lines or less unless there is an obvious reason for the length.
    8. Are the functions and variable clearly named? (Names like do_stuff fail this test.)
    9. Do the functions look like they do what the name says?
    10. Does the code use strict.pm?
    11. Are variables tightly scoped where possible?
    12. Is it made clear up front what all of the globals are?
    13. Is the list of globals used kept to a reasonable length?
    14. Is the return of important system calls (particularly open) consistently checked?
    15. Is -T used?
    16. Are the taint checks reasonably restrictive?
    Note that I have left out file locking. Why? Because auditing them takes a lot of work, and most people who try to use file locking (including most potential auditors) consistently get it wrong. Before even considering looking at issues which are that difficult to analyze, I would use a checklist like the above to fail out 99% of the scripts. If you have never heard of modularizing your code or if you use globals exclusively, you have bigger problems than a few race conditions in a script that is likely to be under light load...
Re: Code review on script site
by FoxtrotUniform (Prior) on Nov 25, 2001 at 04:18 UTC

    I'm a bit leery of the "marking guide" checklist approach you've suggested: it strikes me as unnecessarily inflexible. The commenting thing is a good example: it started off as a reasonable guideline ("-1 for too few comments, +1 for many comments, etc"), but it's certainly plausible that "too few" comments might be just right (if the code is elegantly written and short enough to grasp easily, for instance). So the "must comment" rule got modified to something a bit more vague: "effective use of comments". The same thing happened with modules: "use CGI.pm" went to "use applicable core modules appropriately".

    Giving points based on a definite scheme is sometimes appropriate for assignments in a course, where you have to quantify how well students do in order to give them a final mark (and credits), but here I don't really think it's a good fit. Instead, you might use a checklist like this as a guideline for code reviews, but give a more subjective rating (as well as a small list of pros and cons).

    --
    :wq
Re: Code review on script site
by Zecho (Hermit) on Nov 25, 2001 at 20:14 UTC
    I don't know how easy this would be to implement, but why not write a script that takes the users code as input, checks it for use strict;, -wT etc. if there is a call to open is it followed by or die, if there is an flock does the code use Fcntl ':flock';. Your script could be the first stop for a review, and if the users script meets the minimum qualifications they then have the opportunity to submit it for further review. This would save you and your users a lot of time, and would be an excellent learning tool for beginners.

    Update::Second_Thought

    If the users code fails a check, your script could point the user to the appropriate section in the PerlDocs.

Re: Code review on script site
by Jazz (Curate) on Nov 27, 2001 at 01:34 UTC

    I've thought long and hard about this over the weekend, and agree with nearly all of the suggestions noted. I was also thinking of the flip side of the issue. (People who know me know that I always look on all sides before undertaking as large a project as this; some may call it procrastination, but I digress).

    To date, no other perl archive-type site is planning a code review process. This means that everyone who comes to the site may think that the code listed on the site is crap and go to another site (same crap, but no claims one way or the other).

    Not only may the viewership decrease, but methinks that an archive where 80% or more of the scripts are dubbed as "unworthy for use" may reflect poorly on the language it was written in. Perhaps even extending to the books or notables that recommend the site? Not to actual programmers mind you, but to the webmasters or beginners who are just looking for a little script that does x? Or to potential or newer programmers who will find the learning curve for "more acceptable programming skills" too steep? Whose first lessons in Perl was that "Baby Talk is okay"? Note that the current target audience of the guide is webmasters and newer programmers, not seasoned Perl programmers.

    Ahhh, the joys of breaking new ground :)

    Perhaps the solution would be to formulate guidelines that would make the review process a bit gentler on all parties, including the reviewers (/me braces herself for the reaction on that one). More in-depth code reviews can be solicited by each script's author for a fee (to be determined by an available reviewer or by consensus) or on an online community venue such as this one.

    If I were a webmaster looking for a script, my primary concerns would be 1) cost (not a reviewable item), 2) will it work? 3) is it safe? 4) can I easily make it look the way I want it to? and 5) if something goes wrong, will I be able to understand and maybe fix the error?. I wouldn't necessarily be worried about mnemonic naming, scoping, or style.

    tilly's outline addresses a lot of the concerns that I didn't originally recognize the importance of, so, let me try this again :)

    What I've done here is reorganize many of the points noted above to categories. Each section will be granted equal weight and the overall score will be the aggregate of all categories. The only exception to the scoring is Security, which in addition to the basic score will be prominently red flagged if it fails.

    Obviously, green = pass; red = fail.

    1. Information
      1. Documentation
        1. Accurate and comprehensive description
        2. Inaccurate, incomplete, or no documentation
      2. Installation Instructions
        1. Accurate instructions for defined platform(s)
        2. Explanation and description of dependencies (of non-standard modules)
        3. Inaccurate or no installation instructions
        4. No warning of dependencies
    2. Security
      1. Uses -T in conjunction with valid checks on all user input for potential security breaches or other damage
      2. Does not appear to allow arbitrary commands
      3. open( FILE, "$unchecked_user_input" )
      4. system `$unchecked_user_submitted_command` (or exec or eval)
      5. Error pages that include login or other sensitive information
      6. Bonus Points
        1. Using $CGI::POST_MAXor otherwise limiting maximum post size
    3. Customization
      1. Comments where necessary to clarify the actions being performed
      2. Comments that include customization instructions (either html or code behaviour)
      3. A templating system, such as HTML::Template, Mason, or even an html output section that the user can easily edit
      4. Extensive html/code integration for output
      5. Little or no useful comments
      6. Useless commenting (as noted here
    4. Error Management
      1. Error Trapping
        1. Uses die or other trapping where applicable
        2. Providing useful (and safe) error messages
        3. Not checking the return values of important system calls (particularly open)
        4. Not providing useful (or unsafe) error messages
      2. Error prevention
        1. Uses strict
        2. Uses warnings
        3. Doesn't use strict
        4. Doesn't use warnings
    5. Overall Style
      1. A semblance of consistent style
      2. At-a-glance adherers to perlstyle pass
      3. Margin huggers
      4. Inconsistent or no style
      5. Code Obfuscation (e.g. an attempt to make source code unreadable and/or unmodifiable)

    Again, please allow me to assert that site is little more than an archive of scripts written in Perl. We want to provide basic code reviews in an effort to protect users and less-than-paranoid server admins from dangerous code (or code that does not meet even a minimal set of standards) as well as act as a starting point for more intensive Perl study on behalf of the code authors. The review must be comparatively basic in order for it to be done on that many scripts, particularly with a staff of volunteers. There will definitely be a comments section where reviewers can elaborate, if they wish.

    In most cases, we do not have a (valid) email address for authors whose programs are listed on the site, so most will not even be aware that the code has been reviewed unless they stumble onto it themselves. It is probably safe to assume that the code reviews are being done for the users rather than the authors. In one respect this is ideal, because if a program written in Perl just trashed a server/database/etc., who would want to learn or use it (or continue to)?

    Many thanks for all of your help, comments and suggestions :)
    Jasmine

Re: Code review on script site
by giulienk (Curate) on Nov 25, 2001 at 20:54 UTC
    Maybe i'm just dumb or anarchist, but i really can't get the point of judging code by style.

    IMHO, style is a matter of personal taste, nothing you can rate upon a checklist or via a even good Perl script.
    You say a coder got to use CGI for html output, but what kind of code review CGI would have if posted to the Perl Archive?

    Anyway if it's the only way you got to eradicate foul section of the archive, maybe that's the right thing to do.

Re: Code review on script site
by buckaduck (Chaplain) on Nov 27, 2001 at 01:34 UTC
    You may want to apply a better scaling factor to some of these categories. At this point, it seems to me that a script which:
    • uses strict
    • uses -T and correctly performs taint checking
    • uses CGI.pm to handle CGI form input
    can score worse than a script which does none of these, but which:
    • uses CGI.pm for HTML output
    • uses here-docs instead of multiple print statements
    • has "good" style and "effective" commenting (by some subjective standard)

    Let's keep our priorities clear. Style may count for something, but I think that security and correctness should count for much more.

    Update: The security issues appear to be addressed somewhat in the previous node (which beat me by bare seconds), but I think I still have a point here somewhere...

    Update2: By the way, do you need and/or want help reviewing the scripts?

    buckaduck

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://127201]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (6)
As of 2024-03-19 14:02 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found