Typically, monks will warn new programmers that they must use strict. This pragma is useful, but if you just don't understand programming, it's a false sense of security. Today, I'm working on some code another (ex) employee wrote. The code allows a user to fill out a form, use an attachment, and email the data to one of our users. How many problems can you find? Here's a blindingly obvious hint on one of them: the programmer never told the admin about the directory he was saving files to.

my $outputFile; if( $_file_name !~ /^(\s*)$/ ) { use constant BUFFER_SIZE => 16_384; # Amount of upload file t +o read at one time use constant MAX_FILE_SIZE => 3_145_728; # This is the filesize up +load limit $CGI::DISABLE_UPLOADS = 0; # Temporarily reenable up +loads $CGI::POST_MAX = MAX_FILE_SIZE; # Path and Filename my $file_name = $_file_name; my $file_type = $query->uploadInfo($file_name)->{'Content-Type'}; my $basename = basename($file_name); if( $file_type =~ /octet-stream/ ) { $errors{ 'file_type' } = ["","","Unrecognize your submitted re +sume file format."]; goto Print; } $outputFile = $UPLOAD_RESUME_DIRECTORY . $basename ; my $buffer = ""; open(OUTPUT,">>$outputFile"); my @stats; # Need binmode or Win32 systems will convert end-of-line chars binmode OUTPUT; { no strict 'refs'; READ_FILE: while ( read( $file_name, $buffer, BUFFER_SIZE ) ) +{ print OUTPUT $buffer; @stats = stat $outputFile; last READ_FILE if ( $stats[7] > MAX_FILE_SIZE ) } } close(OUTPUT); #check the file size if ( $stats[7] > MAX_FILE_SIZE || %errors ) { $errors{'file_size'} = ["","","Your submitted file's size is o +ver 3MB."]; unlink $outputFile;

I'll post my observations later. Be careful, there are some subtle bugs here.


Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: strict isn't everything
by Dog and Pony (Priest) on Jun 11, 2002 at 17:52 UTC
    Ok, I'm not too good at this, so I thought this might be a good time to practice then. I found a few things I found curious, although I can't be 100% sure they are wrong, so I'll just post them and see if I hit any marks. :)

    Here's one I am not sure about:

    I would, at a wild guess, think that if uploading resumés, and with the same name, they would, if anything, replace the old one - or be disallowed. This appends to an earlier file, which just seems wrong. I can guess the file will be deleted when this script is done, but this allows for strange effects. Using a unique temporary name would have been better, if the file should not stay there, or to check for duplicates if it should stay. File locking could be yet another way to go.

    This, I wonder if it may have other implications elsewhere?

    $CGI::DISABLE_UPLOADS = 0; # Temporarily reenable up
    I'm not really sure how this works with CGI.pm and all, or how the rest of the code/system looks, but maybe a "local" would have helped?

    I don't get this one either:

    if ( $stats[7] > MAX_FILE_SIZE || %errors ) { $errors{'file_size'} = ["","","Your submitted file's size is over 3M +B."];
    If there are errors, you automatically have too big file size? That ought to confuse a few... :)

    This is an odd check:

    if( $file_type =~ /octet-stream/ ) { $errors{ 'file_type' } = ["","","Unrecognize your submitted resume f +ile format."]; goto Print; }
    There must be more non-allowed file types, or did he mean "!~"?

    I'm sure I missed all the real errors, and pointed out perfectly legitimate stuff, but if that is so, I hope to learn something from the grind-my-face-in-the-ground treatment I am gonna get. :)

    You have moved into a dark place.
    It is pitch black. You are likely to be eaten by a grue.
Re: strict isn't everything
by brianarn (Chaplain) on Jun 11, 2002 at 20:21 UTC
    I've written a few CGI upload scripts before, so I'll give it a shot - no guarantee of quality though. :)

    If the programmer never talked to the admin about where the file will be written to, then the webserver may not have write permission to that directory, ensuring that all open calls fail and the success isn't checked by anything like
    open(OUTPUT,">>$outputFile") or die "Couldn't open $outputFile: $!\n";

    The fact that the regex it runs against the _file_name only checks to see that it's not all whitespace - this opens a nice hole for someone to try and slip in a malicious system call with backticks to do God knows what.

    I'm guessing somewhere in the code above it instantiated the $query CGI object, and yet it's trying to redefine $CGI::DISABLE_UPLOADS and $CGI::POST_MAX after the fact - these would have to be redefined before the $query object is created. If not, one of two things will happen:
    • DISABLE_UPLOADS will be 1, which means the script will receive no file (not sure if this generates an error upon instantiation of the CGI object or just gives an empty file when trying to write to disk)
    • POST_MAX will be set to something not expected, which could potentially cause the object to error out (if, say, POST_MAX was only 1MB and the person's resume is 1.5MB, the script will error out when the CGI object is created, even though in the scripter's eyes, this is a legit resume in size
    It creates a $file_name variable using just $_file_name, which isn't necessarily bad, just wasteful as far as I can see.

    There are checks in the file printout loop to ensure that the file isn't bigger than 3MB in size. If POST_MAX was set properly earlier, then this wouldn't have been a problem. In the odd instance that the admin has POST_MAX set higher than 3MB, this would at least keep the filesize under the POST_MAX. However, it also just stops printing out the file, so if this file is a friggin' huge Word doc, it'll be corrupted.

    The file size checks in both the loops and in the end check could be written much more simply with -s $outputFile rather than statting the file, saving the results to an array and then checking the 7th element (well, 8th element if you count the 0th index as the first element)

    It does another check after it has written the file to disk, and if the filesize exceeds 3MB, then it removes it. However, it doesn't ensure that the file was removed properly with something like
    unlink $outputFile or die "Couldn't unlink $outputFile: $!\n";

    Seeing as the file was opened in append mode, if someone uploads a 2MB resume, then tweaks it and uploads it again, we have problems. In the loop where the file is written to disk, it appends to the old resume anything it's received up until the resume file size hits 3MB, then it stops writing, which would pretty much corrupt any non-text resume, as well as really mess up any text resume. The contents would be the old one, plus however much of the new one it could fit up to 3MB.

    Seeing as it tries to open the file in append mode, never checking for existance of the file or overwriting it, once a person hits their 3MB resume limit, they can't do anything in terms of uploading a revised resume.

    The loop keeps the file size under 3MB, so really the check at the end that unlinks will never be called.

    It seems that most of the errors won't be encountered due to a prior error causing the script to crash. Reading through this script makes me feel better about my CGI-fu because now I know that I at least understand enough basics to not be destroying anything. :)

Re: strict isn't everything
by Abigail-II (Bishop) on Jun 12, 2002 at 12:44 UTC

    Here are some of my remarks. I haven't used CGI.pm in half a dozen years so I won't comment much on proper use of its API. I'd also like to point out that some remarks will be subjective - things I would do different aren't necessarely done wrong here.

    my $outputFile; if( $_file_name !~ /^(\s*)$/ ) {
    I would use $_file_name =~ /\S/ here, but it very well may be that something else should be used. Perhaps a if (defined $_file_name) was called for, but the context isn't known.
    use constant BUFFER_SIZE => 16_384; # Amount of upload file t +o read at one time use constant MAX_FILE_SIZE => 3_145_728; # This is the filesize up +load limit
    Two things. First, the lines exceed 80 characters, which, IMO, is a big no-no. Second there is no point in putting the use statement inside a block - the functions will be exported to the current package. It suggests this has an effect only for this scope, or that it's a run time effect. None of it is true.
    $CGI::DISABLE_UPLOADS = 0; # Temporarily reenable up +loads $CGI::POST_MAX = MAX_FILE_SIZE;
    Unfortunally, not the entire "then" part is given (is just the closing brace missing, or is there more code?), so it's hard to say if something is wrong here. $CGI::DISABLE_UPLOADS is never set back, but that may be in the code that is missing. However, the previous value isn't stored. I would prefer to use local here, to make sure it's set back.

    But there is another more serious problem. CGI.pm processes its input when CGI -> new is called. And that's when you need to know when file uploads are enabled or not. Hence, this setting comes to late, it should be done before the $query object is created. And this is true of $CGI::POST_MAX as well.

    # Path and Filename my $file_name = $_file_name;
    Having two variables with almost identical names can be confusing, specially if one is a copy of the other. But why use two variables here? $file_name isn't being modified.
    my $file_type = $query->uploadInfo($file_name)->{'Content-Type'}; + my $basename = basename($file_name);
    I would retrieve the basename just before I use it, and certainly after the next block, the one that uses the file type.
    if( $file_type =~ /octet-stream/ ) { $errors{ 'file_type' } = ["","","Unrecognize your submitted re +sume file format."]; goto Print; }
    I wonder whether the goto is the correct approach here, but since we don't see what's there, I'd give the author the benefit of doubt. I also don't know the role of %errors here, so no comment on that.
    $outputFile = $UPLOAD_RESUME_DIRECTORY . $basename ; my $buffer = ""; open(OUTPUT,">>$outputFile");
    Ouch. $basename is tainted. I don't see any direct security danger here (due to the basename and the >>) but it makes me feel uneasy. But submits from different people can easily go into the same file - it all depends what filename their browser submitted.

    That not checking the return value of open is a serious mistake should not come as a surprise.

    my @stats; # Need binmode or Win32 systems will convert end-of-line chars binmode OUTPUT; { no strict 'refs'; READ_FILE: while ( read( $file_name, $buffer, BUFFER_SIZE ) ) +{ print OUTPUT $buffer; @stats = stat $outputFile; last READ_FILE if ( $stats[7] > MAX_FILE_SIZE ) } } close(OUTPUT);
    Ok, lots to say about this. What's that no strict 'refs' doing there? I don't think it's doing any harm, but it's odd. And what's the deal with the label on the loop? Again, no harm, but odd. The read is more serious. It's first argument should be a filehandle, not a name of a file. A filehandle to the file that contains whatever is uploaded could be gotten with the upload method.

    The OUTPUT handle hasn't been locked so if more than one request is dealt with simultaneously, uploaded data can become interleaved.

    Also, OUTPUT is a buffered filehandle. Their might be more data written to it than returned by stat. Finally, the return value of close isn't checked. A full disk can cause a failure of the close.

    #check the file size if ( $stats[7] > MAX_FILE_SIZE || %errors ) { $errors{'file_size'} = ["","","Your submitted file's size is o +ver 3MB."] ; unlink $outputFile;
    Most likely the author wanted keys %errors here, not %errors. Also, if looks like that if there are already errors, the submitted file is also over 3Mb. Furthermore, if some someone else has submitted a file of 2.9Mb, and we submit a file of 500kb which happens to have the same name, our error message will be that our file size exceeds 3Mb. Which doesn't seem to be correct. Also, the return value of the unlink isn't checked. And again we have timing problems. Another instance might try to write to the file that's being unlinked.


      I guess I don't need to add too much commentary as most of the posts have it fairly well. However, you asked about the rest of the code:

      Print: my $template_data; if ( $params ) { my @required = qw/ file_size file_type /; $template_data = { errors => \%errors, required_fields => \@required, # more stuff here } print $query->header; $template->process( 'info_request_emp.tmpl', $template_data ) or d +ie $template->error(); exit; }

      The only use of the goto is to skip some code. An else would have been preferred.

      As for $file_name and $_file_name, this is a "sometimes" convention used here. When reading form parameters, the variable with the tainted data begins with an underscore and the untainted one drops the underscore. Obviously, if no untainting occurs, this is useless. Personally, I don't like this ad hoc approach.


      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

        The only use of the goto is to skip some code. An else would have been preferred.

        Hmmm. An else is a typical solution, but it's not something that makes me jump with joy either. It means there's a large block of code that will be indented. And would you have more such cases, the code crawls to the right hand margin.

        A continue might solve this:

        if (CONDITION) {{ # Note the *double* opening brace. ... some code ... next if SOME_CONDITION; ... more code ... next if OTHER_CONDITION; ... even more code ... } continue { ... end stuff ... } }
        And if you do last instead of next, the continue won't be executed.


Re: strict isn't everything
by gav^ (Curate) on Jun 11, 2002 at 22:34 UTC
    /me picks up his red marker pen of destruction +42

    I'll skip what others have mentioned...

    • Using capturing parentheses and then not using what you captured
    • Not checking to see if $_file_name was undef
    • Not stripping out anything naughty from the filename
    • Creating an unnecessary variable $file_name
    • Not sure why the no strict 'refs' is used as nothing is done with symbolic refs in that block
    • It should be %errors > 0 rather than just using %errors in boolean context


      Not checking to see if $_file_name was undef
      if( $_file_name !~ /^(\s*)$/ ) { ...}

      If $_file_name is undefined it will be treated as the empty string in this regular expression. The empty string will match the regular expression and the if statement's body will never be executed.

      Update:gav^'s very right that using warnings will complain about this. Unfortunately many people ignore their server error logs unless something has gone wrong. (Like me today, when I double checked my above paragraph and it worked as I expected.) So this will work if $_file_name is undefined but only at the expense of creating unnecessary lines in your server error logs. End update. :)

      Not sure why the no strict 'refs' is used as nothing is done with symbolic refs in that block.
      # Need binmode or Win32 systems will convert end-of-line chars binmode OUTPUT; { no strict 'refs'; READ_FILE: while ( read( $file_name, $buffer, BUFFER_SIZE ) ) +{ print OUTPUT $buffer; @stats = stat $outputFile; last READ_FILE if ( $stats[7] > MAX_FILE_SIZE ) } }
      We're using a string "$file_name" as a filehandle, and so, since it's a string, we are using symbolic references here. perldoc CGI warns us that turning off strict refs will be necessary:

            The filename returned is also a file handle.  You can read
            the contents of the file using standard Perl file reading
                     # Copy a binary file to somewhere safe
                     open (OUTFILE,">>/usr/local/web/users/feedback");
                     while ($bytesread=read($filename,$buffer,1024)) {
                        print OUTFILE $buffer;
             However, there are problems with the dual nature of the
             upload fields.  If you "use strict", then Perl will com­
             plain when you try to use a string as a filehandle.  You
             can get around this by placing the file reading code in a
             block containing the "no strict" pragma.

      so the writer is doing this part correctly... except they should use -s instead of stat and check that the file opened and all of those good things that have been mentioned.

      It should be %errors > 0 rather than just using %errors in boolean context
      Using %errors in a boolean context forces a scalar context on %errors. Putting a hash in a scalar context returns the number of buckets used or something like that. If this is non-zero then it will evalute to true. If if it zero, then no buckets have been used, the hash is empty and it will evaluate to false - no errors have been found.

      %errors > 0 is equivalent to %errors in a boolean context.

      ++ to your other points though, especially not santitising the filename before reusing it.

      Likewise, if more than one person calls their file "resume" then we'll be adding a mis-mash of resumes on to each other. Ew! And without any kind of file locking, two people could upload their resume.(doc|txt|ps|pdf) and have the files interleaved.

      Interesting problem. Too bad that code like this is all too common.


        While if( $_file_name !~ /^(\s*)$/ ) { does handle the case of undef, it will cause a warning:
        use warnings; my $_file_name = undef; if ($_file_name !~ /^(\s*)$/ ) { print "ok\n"; } __END__ Use of uninitialized value in pattern match (m//) at test.pl line 6.


(jeffa) Re: strict isn't everything
by jeffa (Bishop) on Jun 11, 2002 at 23:18 UTC
    I guess this is rather obvious, by why for the constants inside a block. Those constants are going to be used regardless if the file name is 'valid' or not. Example:
    if (0) { use constant FOO => 5; } print FOO, "\n"; __END__ prints 5


    (the triplet paradiddle with high-hat)
Re: strict isn't everything
by crazyinsomniac (Prior) on Jun 12, 2002 at 03:20 UTC
Re: strict isn't everything
by smitz (Chaplain) on Jun 12, 2002 at 14:51 UTC
    Time for a lowley friar to risk his XP:

    GOTO's and LABELS?!?!

    /me spits

      GOTO's and LABELS?!?!

      Yeah, what about them? People easily balk at seeing a goto, but not all goto's are evil. There are two famous papers on the subject, Go To considered harmful, by E. Dijkstra (although it was C. Hoare that give the paper a title), but that paper doesn't say one should never use a goto - it warns for improper use. And then there's of course Structured Programming with Goto. By D. E. Knuth - perhaps the best Computer Scientist that ever lived.

      Besides, aren't we all fond of next, last and redo? They are nothing but gloried gotos.


      Having coded in systems that don't have good looping constructs, I'd have to say that GOTO's are sometimes a necessary evil. In this script, I'm sure there was a better way than a Goto, but we didn't have enough code to tell. I try and avoid Goto's unless I'm programming on my TI-86, at which point they can be quite handy.

      However, Labels can be quite handy in Perl. A good example might be if you were iterating through an AOA simulating a matrix and wanted to find an element equal to 5 by using two nested loops. You could use a label to break out from the inner loop all the way out of the outer loop quite easily. If for some reason you have to have a loop in a loop in a loop in a loop in a loop but need a mechanism to break out of all of them at once, a label is the easiest way to do it. Labels can also add to the readability of code if you understand how they work.

      All tools have their place - goto is one of those tools that should be used carefully because it can easily create an infinite loop that isn't nearly as easy to some other infinite loops.

Re: strict isn't everything
by cybear (Monk) on Jun 14, 2002 at 18:39 UTC
    I have not dug in yet, but let me say. This is a cool post.
    I sad to only be able to vote ++ once on this.

    This type of post will be very useful to me, and to anyone who
    is trying to get a better understanding of perl.

    It it very difficult to understand how to troubleshoot a problem
    unless you have some experience troubleshooting problems.

    -Thanks for the post OVID