mkirank has asked for the wisdom of the Perl Monks concerning the following question:

Hi Fellow Monks, We have a small team of 3 perl programmers in our organization and when we were thought perl (around 2 years back) we did not know of CGI.pm so we used a subroutine called readparse (code below) and we have never used use Strict...we have continued with that habit and now i am trying to develop a culture of good coding practice but the programmers feel that when the following code has been working and they are used to it and can develop the code at a fast pace(after submitting from a form the variables are created on the fly they do not have to declare each variable and assign values from the form)..i need some thoughts on this Thanks
readparse(*input); my $strTemp; foreach $strTemp(keys %input) { ${($strTemp)} = $input{$strTemp}; } sub readparse { local (*in) = @_ if @_; local ($i, $loc, $key, $val); if ($ENV{'REQUEST_METHOD'} eq "GET") { $in = $ENV{'QUERY_STRING'}; } elsif ($ENV{'REQUEST_METHOD'} eq "POST") { read(STDIN,$in,$ENV{'CONTENT_LENGTH'}); } @in=split(/&/,$in); foreach $i (0..$#in) { $in[$i]=~ s/\+/ /g; ($key, $val) = split(/=/,$in[$i],2); $key =~ s/%(..)/pack("c",hex($1))/ge; $val =~ s/%(..)/pack("c",hex($1))/ge; $in{$key} .= "\0" if (defined($in{$key})); $in{$key} .= $val; } return 1; }

Replies are listed 'Best First'.
Re: Bad Practice
by isotope (Chaplain) on Feb 27, 2003 at 08:30 UTC
    Aside for the inappropriate use of local() instead of my, there are some easy problems here. Strict, -w, and -T would have caught some of them, too.

    You can start with this node. Ok, so this differentiates POST and GET requests. It doesn't validate CONTENT_LENGTH. It doesn't handle multiple select forms.

    This code allows a web user to create any variable he wishes, and overwrite other variables in the rest of the script. Somebody could really have fun with that.

    There is insufficient error checking. What if the literal %YZ appears in the URL? The pack() attempts to dehexify it, but it's not truly URL-encoded. Where is the error caught?

    In fact, this snippet looks much like Ovid's favorite example of bad code on this node.

    For a simpler argument, CGI.pm does it right, according to RFCs, and has been extensively peer-reviewed. If I understand your post correctly, your code has basically been reviewed only by 3 beginners, and will break if you try to use it for things the original coders did not envision when they wrote it.

    --isotope
      I just looked into this CGI module which is supposed to be so great and do it all right. Could you please point out where it actually validates CONTENT_LENGTH? All I can see is that it expects it to be a number. It'll explicitely substitute '0' if the environment variable is undefined, but that's something what Perl would do anyway.

      The fact that it doesn't handle multiple select forms isn't necessary bad. If the forms used on the site don't have multiple selects anyway, it's not bad if you can't deal with them correctly.

      You also say that this code allows to create any variable he wishes. I've looked at the given code a few times, but I can't see what you see. Unless you see a way of the web user being able to manipulate %input, I don't see how he can.

      As for %XY, which is invalid in a URL, this code with treat it as if it had been %00 (which is valid). Is it a problem that the error isn't caught? CGI.pm leaves %XY as %XY, with no feedback. This code at least issues a warning (if they are turned on).

      Your last remark is a bit silly. Do you really think CGI.pm will not break if you use it for things Lincoln Stein didn't envision when he wrote it?

      I'm not saying the presented code is all perfect, not at all. But CGI.pm isn't as holy as you may think.

      Abigail

        You also say that this code allows to create any variable he wishes. I've looked at the given code a few times, but I can't see what you see. Unless you see a way of the web user being able to manipulate %input, I don't see how he can.

        Right after the call to readparse, there is a foreach which loops through %input and creates global variables for every element in the hash. As a contrived example, this could allow the browser to change the process name, by submitting a form containing <input type=hidden name="0" value="HA! HA! GOT YOU!">. If the rest of the code is as insecure as this, you could have a lot of fun with this site.

        From version 2.91:
        if ($meth eq 'POST') { $self->read_from_client(\*STDIN,\$query_string,$content_length,0) if $content_length > 0;
        So CGI.pm is verifying that it's a positive number. Further, within read_from_client(), it calls read(), which only uses the length as an upper bound. The logic then seems to flow like this: If the Content-length header is missing, assume 0. If it's there, and greater than zero, then try to read from the client until you reach the end of the input, or the value of content_length, whichever comes first. Granted, Lincoln didn't explicitly verify that content_length is an integer, but the above code and the call to read() should cover that.

        Treating %XY in a URL as the literal "%XY" makes more sense to me than trying to unescape it and coming up with '\0', but that's probably just me.

        CGI.pm isn't perfect, but I'd trust it a whole lot more than this snippet bastardized from cgi-lib.pl, especially coupled with the loop that does create global variables named for the CGI parameter names.

        --isotope
Re: Bad Practice
by Cody Pendant (Prior) on Feb 27, 2003 at 08:43 UTC
    As far as I can see, this code does exactly what CGI.pm does if you do this:
    use CGI; CGI::ReadParse();
    that is, it puts everything into a hash called "%in", with multiple values separated by \0 if need be.

    Only, as pointed out previously. CGI.pm will also be taking care of a lot of other stuff to do with methods other than POST or GET, security and so on.
    --

    “Every bit of code is either naturally related to the problem at hand, or else it's an accidental side effect of the fact that you happened to solve the problem using a digital computer.”
    M-J D
Re: Bad Practice
by petesmiley (Friar) on Feb 27, 2003 at 15:43 UTC
    Well, it seems as though everyone has thoroughly combed your code. As for the rest of your post on the other programmers' behavior...

    *sigh* Your are the victim of human behavior at work. My suggestion. Make your suggestions and then leave it at that. Code up to your expectations and let them code up to theirs. It sounds like they are very similar to a rather large crowd of programmers I have met in the work place. They aren't excited about what they do, so they do enough to get by/collect a paycheck. ie. They won't learn object-oriented/abstracted/modular/efficient/etc... programming unless a project absolutely requires it. I believe Larry would call this false laziness.

    Unless you're their boss, then all the rules are different.

Re: Bad Practice
by bart (Canon) on Feb 27, 2003 at 08:44 UTC
    You created it? It looks to me like you lifted it out of Steve Brenner's old cgi-lib.pl. Heck, it even appears to me that Steve Brenner got one bug fixed which you guys didn't: he used
    @in = split(/[&;]/,$in);
    while you still use
    @in=split(/&/,$in);
    Modern browsers may use ";" as separators as an alternative to "&", so you need to fix that.

    As for using it... see any discussion on hanging on to cgi-lib.pl.

      *Read Carefully* ...i never said we created it ...that is how we are shown to do it and we have been using it ...
      Use of ";" as separator?! I never saw that!

      I have some modules that parse the incoming data for GET and POST urlencoded. They use only "&".

      Can you send some browser name that use ";"? Or any doc that tell that ";" can be used?

      I need to see if the modules need some update. ;-P

      Update: I saw in CGI.pm inside sub parse_params:
          my(@pairs) = split(/[&;]/,$tosplit);

      Graciliano M. P.
      "The creativity is the expression of the liberty".

        At least it's mettioned in the text of RFC-1866, section 8.2.1. See the bottom of that paragraph.
        I don't recall where, but I have seen ";"s used in a URL before. I think it was related to separating session keys from the rest of the query. It's not part of the normal query string format, but I have seen it used in some instances.

        -Alex

Re: Bad Practice
by pg (Canon) on Feb 27, 2003 at 16:20 UTC
    I fully understand what you said, but I think your teammates’ thinking is also valid. If your old code still works, it still works, although it might have a “bad” style as you said. I would say, for any new development, you should go with the “good” style, there is no doubt about that.

    But for existing code in your repository, you don’t just go back, and modify them because their style is “bad” (I rather say it is “old”, it might be “good” at the time it was developed.) To keep your system stable is also a “good” practice.

    Any way, discuss with your teammates more, but don’t bluntly call their opinion as bad practice. Bad practice is a very serious accusation against programmers, and some people really care. The purpose of pushing your idea is to make others accept it, you have to do it skillfully, if you push too hard, chance is that you will achieve nothing but to hurt the synergy among teammates…

    I know you can make it, good luck ;-)

      The code may work as is, but as has already been pointed out in the thread there are some potential security issues as well. I'd think that alone would be enough to convince someone to rewrite the code sections in question.

      In addition, there is definitely a place for rewriting code that has bad style -- if you have the time to do so. For example, there used to be a programmer here who wrote scripts in Perl that look like entries into an obfu contest. They're valid Perl, and they do what was asked. But management has decided that a lot of these scripts need to be updated for maintainability. Not to mention that some of the scripts needed minor changes that took weeks to implement due to coding style.

      You don't have to point at anyone and say "Your style sucks! You should never write another program!". You can easily do this without personal attacks, which is what I think you mean by not being blunt. So since you've already made that point I won't reiterate it. :)

Refactoring
by Wally Hartshorn (Hermit) on Feb 28, 2003 at 15:41 UTC

    I'm just now getting into it, but one of the practices of Extreme Programming (dorky name, but good ideas) is constant refactoring of code. (Must use more buzzwords!) Basically, when you see a bit of code that could be made clearer or that, if changed, would make it easier to add the new functionality on which you are working, you do so.

    However, you should only do the refactoring if you have unit test cases that you can run to verify that your changes didn't break anything. In addition, you should do the refactoring of the old code separately from adding the new functionality, so that you can know whether any problems shown by the unit test cases are the result of the refactoring or are the result of the new functionality.

    In short, improving code is a good thing, so long as you can verify that the improved code works as well as the old code.

    Wally Hartshorn

    (Plug: Visit JavaJunkies, PerlMonks for Java)

Re: Bad Practice
by shirkdog_perl (Beadle) on Feb 28, 2003 at 19:21 UTC
    In my experience, the documentation for the CGI.pm module is enough to fix code to a readible level.

    #!/usr/local/bin/perl -w use strict; use CGI; my $q = new CGI; print $q->header(-type=>'text/html'), $q->start_html(-title=>'CGI.pm is Great'), $q->h1('CGI is GREAT'), $q->hr, $q->end_html;