Beefy Boxes and Bandwidth Generously Provided by pair Networks
XP is just a number

A patched CGI::Simple for REST applications - comments please

by isync (Hermit)
on Aug 21, 2010 at 13:05 UTC ( #856447=perlquestion: print w/replies, xml ) Need Help??

isync has asked for the wisdom of the Perl Monks concerning the following question:

Handling PUT or PROPFIND methods with CGI turned out - at least for me - to be quite a challenge while doing it under mod_cgi with common helpers like or CGI::Simple.

First, they do not handle less-common methods like PROPFIND well, throw away message body content by pretending to read it or deny answering them at all.
Second, large message bodies are (commonly) slurped in! But PUT and POST requests can get large, even PROPFIND XML message bodies might get bloated - and all these requests might thus pose a DoS effectively.

Please find attached my take on fixing these issues. It's a WIP so please comment!

Further, I'd like a knowledgeable monk to tell me if my attempts to keep STDIN content unprocessed so my script can later do a buffered read are ineffective. Does it really make any difference to keep arriving content on STDIN than copying it to a variable (I think this answers itself, didn't it?)
Or is a closely knit-in Apache handler the only way to process message body data *as it arrives*? Or wouldn't even this do the trick?
In reality it seems to work as intended: with the code snippet at the bottom of this post I can see an uploading file growing in filesystem as data arrives.
package REST::CGI::Simple; # # patched version of CGI::Simple # - PROPFIND method with message body possible, see keyword "patch1" # - no PUT data value parsing for zeros (may also be set by NO_NULL = +0), but hardcoded, see keyword "patch2" # - differentiate handling of medium and very large message bodies, se +e keyword "patch3" ... sub _read_parse { my $self = shift; my $handle = shift || \*STDIN; my $data = ''; my $type = $ENV{'CONTENT_TYPE'} || 'No CONTENT_TYPE received'; my $length = $ENV{'CONTENT_LENGTH'} || 0; my $method = $ENV{'REQUEST_METHOD'} || 'No REQUEST_METHOD received'; # first check POST_MAX Steve Purkis pointed out the previous bug if ( ( $method eq 'POST' or $method eq 'PUT' or $method eq 'PROPFIN +D' ) # patch1 and $self->{'.globals'}->{'POST_MAX'} != -1 and $length > $self->{'.globals'}->{'POST_MAX'} ) { $self->cgi_error( "413 Request entity too large: $length bytes on STDIN exceeds \$ +POST_MAX!" ); # silently discard data ??? better to just close the socket ??? while ( $length > 0 ) { last unless _internal_read( $self, $handle, my $buffer ); $length -= length( $buffer ); } return; } if ( $length and $type =~ m|^multipart/form-data|i ) { my $got_length = $self->_parse_multipart( $handle ); if ( $length != $got_length ) { $self->cgi_error( "500 Bad read on multipart/form-data! wanted $length, got $got +_length" ); } return; } # patch3 elsif ( ( $method eq 'POST' or $method eq 'PUT' or $method eq 'PROP +FIND' ) and $self->{'.globals'}->{'HANDLE_ONLY_ON_LARGE_CONTENT'} == 1 and $length > $self->{'.globals'}->{'HANDLE_ONLY_WHEN_LARGER_AS'} +) { # reading data from STDIN here would mean taking it away from a # script which might process it on its own, so we just signal that # it is still there, untouched $self->_add_param( 'HANDLE_ONLY', 1 ); $ENV{STDIN_WAITING} = 1; } elsif ( $method eq 'POST' or $method eq 'PUT' or $method eq 'PROPFIN +D' ) { # patch1 if ( $length ) { # we may not get all the data we want with a single read on larg +e # POSTs as it may not be here yet! Credit Jason Luther for patch # < 2.99 suffers from same bug _internal_read( $self, $handle, $data, $length ); while ( length( $data ) < $length ) { last unless _internal_read( $self, $handle, my $buffer ); $data .= $buffer; } unless ( $length == length $data ) { $self->cgi_error( "500 Bad read on POST! wanted $length, got " . length( $data ) ); return; } if ( $type !~ m|^application/x-www-form-urlencoded| ) { $self->_add_param( $method . "DATA", $data ); } else { $self->_parse_params( $data ); } } } elsif ( $method eq 'GET' or $method eq 'HEAD' ) { $data = $self->{'.mod_perl'} ? $self->_mod_perl_request()->args() : $ENV{'QUERY_STRING'} || $ENV{'REDIRECT_QUERY_STRING'} || ''; $self->_parse_params( $data ); } else { unless ( $self->{'.globals'}->{'DEBUG'} and $data = $self->read_from_cmdline() ) { $self->cgi_error( "400 Unknown method $method" ); return; } unless ( $data ) { # I liked this reporting but does not behave like this so # out it goes...... # $self->cgi_error("400 No data received via method: $method, type: $t +ype"); return; } $self->_parse_params( $data ); } } ... sub _add_param { my ( $self, $param, $value, $overwrite ) = @_; return () unless defined $param and defined $value; $param =~ tr/\000//d if $self->{'.globals'}->{'NO_NULL'}; @{ $self->{$param} } = () if $overwrite; @{ $self->{$param} } = () unless exists $self->{$param}; my @values = ref $value ? @{$value} : ( $value ); for my $value ( @values ) { next if $value eq '' and $self->{'.globals'}->{'NO_UNDEF_PARAMS'}; # $value =~ tr/\000//d if $self->{'.globals'}->{'NO_NULL'}; # pa +tch2: we could ask for NO_NULL, but this was such a headache # that I hardcoded it as commented out $value = Encode::decode( utf8 => $value ) if $self->{'.globals'}->{PARAM_UTF8}; push @{ $self->{$param} }, $value; unless ( $self->{'.fieldnames'}->{$param} ) { push @{ $self->{'.parameters'} }, $param; $self->{'.fieldnames'}->{$param}++; } } return scalar @values; # for compatibility with request.t }
(Only excerpts, something prevented me from posting the whole..)

With this CGI::Simple variant in effect, I later in my code do:
... my $fh = $fs->open_write( $path ); # Filesys::Virtual::Plain if( $fh ){ if($ENV{STDIN_WAITING}){ File::Copy::copy(\*STDIN,$fh); # rely on File::Copy's + robust buffered GLOB to HANDLE copy }else{ print $fh $self->query->param('PUTDATA'); } $fs->close_write($fh); return 1; }else{ $fs->close_write($fh); return 0; }

Replies are listed 'Best First'.
Re: A patched CGI::Simple for REST applications - comments please
by jaldhar (Vicar) on Aug 22, 2010 at 14:43 UTC

    Hello again, I've created a module called REST::Utils which contains a get_body function based in part on our last conversation. It is considerably simpler than the code you have shown above but I wonder if you would like to try it and give it a critique. (And maybe patches if it can be adapted to your needs?)


      From a glance at the code, it seems a handy module when doing REST with modules. Good that you released it and the insight it provides along with it to the puplic!

      What remains is the handling of large uploads by handle only. Admittedly, I handled it a bit heavy handed in my patching of CGI::Simple. Still, I see no alternative to switching between handling smaller requests by slurping it all into a var, while larger requests need to remain on STDIN and be read() buffered. Can be done by wrapping your get_body() method with an if/else switch based on POST_MAX.
      Or could be that the abstractions that modules like provide are by design opposites of the low-level access my specific scripts require...

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://856447]
Approved by pKai
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others imbibing at the Monastery: (4)
As of 2020-07-11 11:02 GMT
Find Nodes?
    Voting Booth?

    No recent polls found