Beating the system

mikeirw has asked for the wisdom of the Perl Monks concerning the following question:

I'm working on a simple script to allow users to upload files to a web server. Everything works fine for files of about 35 KB or so, but anything at all above that causes the script to be killed by the server. My host (pair Networks) imposes the following limits on CGI resource usage:

Size of Core Files - 0 MB
CPU Time Used - 30 seconds
Data Size - 3 MB
File Size Created - 1 MB
Memory Locked - 1 MB
Number of Open Files - 32
Number of Simultaneous Processes - 8

I've benchmarked the code, and it comes nowhere near the CPU time limit, even under heavy load. So, my question is, what is causing the script to be killed if the file is over 35 KB?

Here's the relevant parts of the script, some of which was lifted from CGI Programming with Perl:

use strict;
use CGI;
use Fcntl qw( :DEFAULT :flock );
use File::Basename;

use constant UPLOAD_DIR     => '/some/dir';
use constant TYPES          => qw( .txt .jpg );
use constant BUFFER_SIZE    => 16_384;
use constant MAX_FILE_SIZE  => 1_048_576;
use constant MAX_DIR_SIZE   => 100 * 1_048_576;
use constant MAX_OPEN_TRIES => 100;

$CGI::POST_MAX = MAX_FILE_SIZE;

sub dir_size {
    my $dir      = shift;
    my $dir_size = 0;

    # Loop through files and sum the sizes; doesn't descend down subdi
+rs.
    opendir DIR, $dir or error( $q, "Unable to open $dir: $!" );

    while ( readdir DIR ) {
        $dir_size += -s "$dir/$_";
    }
    return $dir_size;
}

{
    my $q      = new CGI;
    my $os     = $q->param('os');
    my $file   = $q->param('file');
    my $title  = $q->param('title');
    my $fh     = $q->upload('file');
    my $buffer = '';

    if ( dir_size(UPLOAD_DIR) + $ENV{CONTENT_LENGTH} > MAX_DIR_SIZE ) 
+{
        error( $q, 'Upload directory is full.' );
    }
    elsif ( $file ne '' ) {
        fileparse_set_fstype($os);

        my ( $base, $path, $ext ) = fileparse( $file, qr/\..*/ );
        my $num_types = scalar TYPES;
        my $maybe     = 0;

        foreach my $type (TYPES) {
            $maybe++ if $type !~ /$ext/i;
        }

        error( $q,
            'Invalid file type. Please upload only '
              . join ( ' ', TYPES )
              . ' files.' )
          unless $maybe < $num_types;

        my $filename = $base . $ext;
        $filename =~ s/[^\w.-]/_/g;

        if ( $filename =~ /^(\w[\w.-]*)/ ) {
            $filename = $1;
        }
        else {
            error( $q,
                'Invalid file name. Files must start with a letter or 
+number.'
            );
        }

        # Open output file, making sure the name is unique.
        until (
            sysopen OUTPUT,
            UPLOAD_DIR . "/$filename",
            O_RDWR | O_EXCL | O_CREAT
          )
        {
            $filename =~ s/(\d*)($ext)$/($1||0) + 1 . $2/e;
            $1 >= MAX_OPEN_TRIES and error( $q, 'Unable to save your f
+ile.' );
        }

        # This is necessary for non-Unix systems; does nothing on Unix
+.
        binmode $fh;
        binmode OUTPUT;

        while ( read( $fh, $buffer, BUFFER_SIZE ) ) {
            print OUTPUT $buffer;
        }
        close OUTPUT;

        print $q->header, $q->start_html( -title => 'Successful Upload
+!', ),
          $q->h1('Your file was successfully uploaded!'), $q->end_html
+;
    }
    else {
        error( $q, 'You must specify a file to upload.' );
    }
}
[download]

20020930 - Edit by Corion: Added READMORE tag, as it's frontpaged now

Comment on Beating the system Download Code

Replies are listed 'Best First'.
Re: Beating the system by Jenda (Abbot) on Sep 29, 2002 at 15:10 UTC
Maybe there's one more limit they did not tell you about. The limit on the amount of data a POST request can contain. Try to ask them. If there is such a limit and they will refuse to increase it you are pretty much done. The only thing you could do then is to ask the users to zip and/or split their files. The users will not like it. Jenda	[reply]
Re: Beating the system by dws (Chancellor) on Sep 29, 2002 at 17:26 UTC
My host (pair Networks) imposes the following limits on CGI resource usage: ... I'm also at pair.com, and regularly upload files much larger than 35K. In comparing my upload script with yours, the only significant difference I see is that my read loop looks like `while ( <$fh> ) { print OUTPUT $_; }` [download] instead of `while ( read( $fh, $buffer, BUFFER_SIZE ) ) { print OUTPUT $buffer; }` [download] Should that make a difference? At first glance, I don't see why it should. I might be on a box that's configured differently than the one you're one, or the box might be more lightly loaded.	[reply] [d/l] [select]
Re^2: Beating the system by Aristotle (Chancellor) on Sep 29, 2002 at 18:24 UTC
It does make a difference, but not a significant one: the read version is more efficient.. Makeshifts last the longest.	[reply]
Re: Beating the system by blm (Hermit) on Sep 29, 2002 at 15:20 UTC
There is a way to limit the size of posts when using CGI.pm. There is a variable called `$CGI::POST_MAX` that contains the maximun size of posts. This certainly affects file uploads. You can find out what the limit is on your system by using this `use CGI; print $CGI::POST_MAX;` [download] If this prints a negative value then there isn't a limit. To find out more look at perldoc CGI. --blm-- If you don't like this post can you please /msg me	[reply] [d/l] [select]
Re: Re: Beating the system by mikeirw (Pilgrim) on Sep 29, 2002 at 15:29 UTC
Thanks for the tip, but I actually have $CGI::POST_MAX set in my script to the constant MAX_FILE_SIZE, which is 1_048_576. Without setting it myself, and just printing its value, it returns -1.	[reply]
Re: Re: Re: Beating the system by diotalevi (Canon) on Sep 29, 2002 at 16:00 UTC
You know... from what I said in another node I think you might have some mileage in knocking that limit down some - keep CGI's use of memory lower and all. A closer reading of CGI.pm leads me to think it's not as big a deal but perhaps... Consider using Devel::Peek on your CGI object's BUFFER member.	[reply]
Re: Re: Re: Beating the system by blm (Hermit) on Sep 29, 2002 at 15:35 UTC
Oops. Sorry. I don't know how I missed that --blm-- If you don't like this post can you please /msg me	[reply]
Re: Beating the system by diotalevi (Canon) on Sep 29, 2002 at 15:32 UTC
It's hard to guess. Perhaps either open filehandle counts or locked memory. CGI.pm will load as much of the file as possible into memory (whatever is currently available to a fh->read call) so that might impact your locked memory usage. While I'm no OS programmer I always thought that memory was locked so you do an immediate read/write from it - perhaps you're tying up valuable space with your data. I also spent a minute reading each of your imports to guess at the number of open file handles (not much but perhaps more than you expected) to see if that was the problem. Counting file handles: low teens. 1 perl 1 the script 1 DynaLoader.so / .dll 3 CGI.pm => Carp.pm => Exporter.pm 2 Fcntl.pm => Fcntl.so / .dll 1 File/Basename.pm ? I don't know whether to count STDIN/STDOUT. 1 The file opened on disk for storing the uploaded file.	[reply]
Re^2: Beating the system by Aristotle (Chancellor) on Sep 29, 2002 at 15:57 UTC
It doesn't make sense that there would be enough filehandles when files uploaded are under 35kb though. Makeshifts last the longest.	[reply]
Re: Re^2: Beating the system by diotalevi (Canon) on Sep 29, 2002 at 16:03 UTC
I think you meant "wouldn't". Anyhow yeah, that's what counting lead me to believe but I didn't want to throw away that work either ;-) I figured that maybe someone else might be interested in all the other stuff that gets opened up as well.	[reply]
Re^4: Beating the system by Aristotle (Chancellor) on Sep 29, 2002 at 18:20 UTC
Re: Re^4: Beating the system by diotalevi (Canon) on Sep 29, 2002 at 18:33 UTC
Some notes below your chosen depth have not been shown here
Re: Beating the system by Kanji (Parson) on Sep 29, 2002 at 17:58 UTC
What kind of bandwidth do you have and do you see different file sizes if someone with a faster or slower connection than you tries an upload? If so, and you're seeing errors like `'CGI.pm: Server closed socket during multipart read (client aborted?).'` in your error logs (you are checking your error logs, right?), then you might be running into Apache's TimeOut or something similar. A workaround for such (or possibly something else to try) is to fork() your own server for the duration of an upload. --k.	[reply]
Re: Re: Beating the system by mikeirw (Pilgrim) on Sep 29, 2002 at 18:38 UTC
I'm personally testing from a machine that's on dial-up, but I've had another person test it with a broadband connection, and he received the same error. The error is a generic 500, and unfortunately, pair Networks does not allow access to the Apache error logs. I've setup the script to test for errors issued by CGI.pm, and I'm not seeing anything there. Testing the script on my own machine, I'm able to transfer multi-MB files without a hitch. Thanks to all who've provided suggestions. I'll give them a go, and post a resolution if I ever get one.	[reply]
Re: Re: Re: Beating the system by samtregar (Abbot) on Sep 30, 2002 at 04:22 UTC
Use CGI::Carp so that you can get the error message in the browser, if it's a Perl error: `use CGI::Carp qw(fatalsToBrowser);` Trying to debug a CGI without access to STDERR is needlessly difficult. -sam	[reply] [d/l]
Re: Beating the system by shotgunefx (Parson) on Sep 30, 2002 at 05:15 UTC
Have you checked cgi_error? `my $error = $q->cgi_error; if ($error) { print $q->header(-status=>$error), $q->start_html('Problems'), $q->h2('Request not processed'), $q->strong($error); exit 0; }` [download] -Lee "To be civilized is to deny one's nature."	[reply] [d/l]
Re: Beating the system by rbi (Monk) on Sep 30, 2002 at 10:40 UTC
I put here some stuff that I use to get uploaded files, it does not run as it is here posted, but it may give you some ideas, hopefully. Have you read about uploading files in the CGI.pm documentation? I started from Lincoln's Stein examples and it works pretty fine, no matter the size of the file to upload. ################ sub upload_files { ################ do_upload_prompt(); do_upload_work(); } #################### sub do_upload_prompt { #################### # This routine prompts for the upload of new files. # it uses CGI.pm, $myself is the name of the cgi script # Start a multipart form. print $query->start_multipart_form(-action=>"$myself"), filefield(-name=>'uploaded_file', -size=>30), br(), reset, submit(-label=>'Upload'), end_form; } } ################## sub do_upload_work { ################## my $tmppool = '/home/dummy/tmp/' my $filename_server; my $filename_client; my $filename_tmp; # Process the form if there is a file name entered # Get the name of the uploaded file on server, if it exits my $filename_client = $query->upload('uploaded_file'); # In case a file was indicated for upload, proceed if ($filename_client) { # Chech that the name is UNIX-compatible # Remove path from filename, if any (Internet Explorer keeps the path. +.) $filename_server = $filename_client; $filename_server =~ s(^.\\)(); # Windows $filename_server =~ s(^./)(); # Unix # Make filename lowercase $filename_server=lc($filename_server); # Chech that the name is compatible for your purposes # get temporary filename $filename_tmp = $query->tmpFileName($file); # copy temporary file to the destination directory system "$CP $filename_tmp $tmppool$filename_srv"; } } [download] Please note the operation on file name to remove the path on the client. Under Internet Explorer it is kept together with the file name. Regards, Roberto	[reply] [d/l]
Re: Beating the system by rbi (Monk) on Sep 30, 2002 at 10:01 UTC
You might want to check the httpd.conf file if you're using Apache, in particular: # Timeout: The number of seconds before receives and sends time out Timeout 300 # KeepAlive: Whether or not to allow persistent connections (more than # one request per connection). Set to "Off" to deactivate. KeepAlive On # MaxKeepAliveRequests: The maximum number of requests to allow # during a persistent connection. Set to 0 to allow an unlimited amoun +t. # We reccomend you leave this number high, for maximum performance. MaxKeepAliveRequests 100 # KeepAliveTimeout: Number of seconds to wait for the next request KeepAliveTimeout 15 # [download] It might be that your connection is not persistent or it is too slow compared to the timeout if anything goes wrong along the line. Regards, Roberto	[reply] [d/l]
Re: Beating the system by Anonymous Monk on Sep 30, 2002 at 04:59 UTC
Have you tried decreasing your BUFFER_SIZE? Mine is set at 1024. I'm not promising that decreasing your BUFFER_SIZE will change anything, but 16_384 seems pretty high. --shin	[reply]
Re: Beating the system by zakzebrowski (Curate) on Sep 30, 2002 at 11:56 UTC
Consider using log4perl to help solve your problem.... That way, you can have debug statements everywhere in your program, but only write it to the log when you need it, which might narrow down where the bug is... (Excuse english, not enough coffee yet...) ---- Zak "There is no room in this country for hyphenated Americanism" ~ Theodore Roosevelt (1915)	[reply]
Re: [SOLVED] Beating the system by mikeirw (Pilgrim) on Sep 30, 2002 at 22:27 UTC
Well, it looks like it was a memory issue after all. After changing a couple of packages from `use` to `require`, everything seems to work fine. A big thank you to everyone for their help. I definitely picked up some useful debugging tips.	[reply] [d/l] [select]
Re: Re: [SOLVED] Beating the system by dws (Chancellor) on Sep 30, 2002 at 22:36 UTC
After changing a couple of packages from use to require, everything seems to work fine. Details, please. This sounds a bit too much like "I did stuff until things seemed to work."	[reply]
Re: Re: Re: [SOLVED] Beating the system by mikeirw (Pilgrim) on Oct 01, 2002 at 00:07 UTC
I came to this solution (after trying many of the suggestions posted) by commenting out parts of the script to see what kind of effect it had. For example, I have a couple of subs that I invoke after the uploaded file is written to disk that do various things depending on the file type. The subs require a couple of different packages that I was `use`'ing at the top of the script. If I commented out those subs and their required packages, I was able to get larger files uploaded with no problem. The next thing was to figure why these subs were causing the script to be killed before they were even invoked. The reason I didn't think they were a problem initially was because any file that was 35 KB or under worked flawlessly, while anything above caused the script to be killed before it could write the file to disk, hence they were never called. By changing the packages that these subs need from being `use`'d to `require`'d, I was able to upload much larger files without the script being killed and still have them work their magic.	[reply] [d/l] [select]
Re: Re: Re: Re: [SOLVED] Beating the system by diotalevi (Canon) on Oct 01, 2002 at 00:21 UTC
Re: Re: Re: Re: Re: [SOLVED] Beating the system by mikeirw (Pilgrim) on Oct 01, 2002 at 00:32 UTC


Think about Loose Coupling
	PerlMonks