http://www.perlmonks.org?node_id=609632

derby has asked for the wisdom of the Perl Monks concerning the following question:

update: As of 2007/11/30, CGI v 3.31 has the patch to accept PUT data correctly. Thanks to prodding from rhesa, CGI and CGI::Simple now support all the HTTP methods necessary to build REST services.


I finally started building a true REST webservice when I ran smack into a wall. The service is your basic crud service where the HTTP GET will retrieve products, DELETE will delete, POST will update and PUT will create:

GET http://foo.com/webservice/<productid> DELETE http://foo.com/webservice/<productid> POST http://foo.com/webservice/<productid> PUT http://foo.com/webservice
Nothing out of the ordinary there right? The thing is, I'm a big fan of CGI::Application and it uses CGI at its core (but that's overridable). The wall is the way CGI handles the PUT method (it doesn't really) and the way it handles POST methods -- it's designed for html form parsing. No problem, I thought. CGI::Application has the capability to switch out CGI with any other module as long as that module adheres to the CGI interface (well, not the entire interface).

So I needed a module that would

  1. adhere to the CGI interface
  2. support the HTTP PUT method
  3. not form parse PUT and POST data
after much searching, I couldn't find a module for those needs. The closest I came was CGI::XMLpost but that wasn't even horseshoe close.

I finally decided to build one myself but given the nature of CGI, I was pretty sure it wasn't going to be quick and it wasn't going to be pretty. I had used CGI::Simple in the past and started thinking if there was a way to co-opt it into what I wanted.

After looking at the code, I figured out all I need to do was override its' _read_parse method and then add accessors for the POST and PUT data. There were only 4 changes needed for _read_parse

package Foo::CGI::Rest; use base 'CGI::Simple'; sub _read_parse { my $self = shift; my $data = ''; my $type = $ENV{'CONTENT_TYPE'} || 'No CONTENT_TYPE received'; my $length = $ENV{'CONTENT_LENGTH'} || 0; my $method = $ENV{'REQUEST_METHOD'} || 'No REQUEST_METHOD received'; # change #1 - added or "PUT" here ... we don't want # malicious PUTs either # first check POST_MAX Steve Purkis pointed out the previous bug if( ( $method eq 'POST' or $method eq "PUT" ) and $self->{'.globals'}->{'POST_MAX'} != -1 and $length > $self->{'.globals'}->{'POST_MAX'}) { $self->cgi_error( "413 Request entity too large: $length bytes on STDIN exceeds +\$POST_MAX!" ); # silently discard data ??? better to just close the socket ??? while ($length > 0) { last unless sysread(STDIN, my $buffer, 4096); $length -= length($buffer); } return; } if( $length and $type =~ m|^multipart/form-data|i ) { my $got_length = $self->_parse_multipart; if( $length != $got_length ) { $self->cgi_error("500 Bad read on multipart/form-data! wanted $l +ength, got $got_length"); } # changed #2 - or "PUT" here too } elsif( $method eq 'POST' or $method eq 'PUT' ) { if( $length ) { # we may not get all the data we want with a single read on larg +e # POSTs as it may not be here yet! Credit Jason Luther for patch # CGI.pm < 2.99 suffers from same bug sysread(STDIN, $data, $length); while( length($data) < $length ) { last unless sysread(STDIN, my $buffer, 4096); $data .= $buffer; } # change 3 - don't send data to parse params ... it's not form d +ata if( $length == length $data ) { $self->set_data( $data ); } else { $self->cgi_error("500 Bad read on POST! wanted $length, got " +. length($data)); } } } elsif( $method eq 'GET' or $method eq 'HEAD' ) { $data = $self->{'.mod_perl'} ? $self->_mod_perl_request()->args() : $ENV{'QUERY_STRING'} || $ENV{'REDIRECT_QUERY_STRING'} || ''; $self->_parse_params($data); } else { unless ($self->{'.globals'}->{'DEBUG'} and $data = $self->read_from_cmdline()) { $self->cgi_error("400 Unknown method $method"); } } } # change 4 - create accessors sub set_data { my( $self, $data ) = @_; $self->{_data} = $data; } sub get_data { my( $self ) = @_; return $self->{_data}; } 1;
Now in my CGI::Application all I have to do is
sub cgiapp_get_query { my $self = shift; require Foo::CGI::Rest; return Foo::CGI::Rest->new(); }
and in my handlers for POST and PUT:
sub update { my $self = shift; my $cgi = $self->query(); my $xmlstr = $cgi->get_data(); ... } sub create { my $self = shift; my $cgi = $self->query(); my $xmlstr = $cgi->get_data(); ... }

The thing that worries me though, is REST has been around a few years now and CGI has been around forever. So any ideas why CGI and it's derivatives treat PUT like a read headed step child?

-derby

Update: Updated title.

Update: Given the great feedback from rhesa, I've further simplified the _read_parse override by setting the POSTDATA and PUTDATA params if the POST'ed and PUT'ed data is not of type 'application/x-www-form-urlencoded' ... hmmm maybe I should submit this as a patch to the CGI::Simple author. Here it is in all it's simpleness:

package Foo::CGI::Rest; use base 'CGI::Simple'; sub _read_parse { my $self = shift; my $data = ''; my $type = $ENV{'CONTENT_TYPE'} || 'No CONTENT_TYPE received'; my $length = $ENV{'CONTENT_LENGTH'} || 0; my $method = $ENV{'REQUEST_METHOD'} || 'No REQUEST_METHOD received'; # first check POST_MAX Steve Purkis pointed out the previous bug if( ( $method eq 'POST' or $method eq "PUT" ) and $self->{'.globals'}->{'POST_MAX'} != -1 and $length > $self->{'.globals'}->{'POST_MAX'}) { $self->cgi_error( "413 Request entity too large: $length bytes on STDIN exceeds +\$POST_MAX!" ); # silently discard data ??? better to just close the socket ??? while ($length > 0) { last unless sysread(STDIN, my $buffer, 4096); $length -= length($buffer); } return; } if( $length and $type =~ m|^multipart/form-data|i ) { my $got_length = $self->_parse_multipart; if( $length != $got_length ) { $self->cgi_error("500 Bad read on multipart/form-data! wanted $l +ength, got $got_length"); } } elsif( $method eq 'POST' or $method eq 'PUT' ) { if( $length ) { # we may not get all the data we want with a single read on larg +e # POSTs as it may not be here yet! Credit Jason Luther for patch # CGI.pm < 2.99 suffers from same bug sysread(STDIN, $data, $length); while( length($data) < $length ) { last unless sysread(STDIN, my $buffer, 4096); $data .= $buffer; } if( $length == length $data ) { if( $type !~ m|^application/x-www-form-urlencoded| ) { $self->_add_param( $method . "DATA", $data ); } else { $self->_parse_params( $data ); } } else { $self->cgi_error("500 Bad read on POST! wanted $length, got " +. length($data)); } } } elsif( $method eq 'GET' or $method eq 'HEAD' ) { $data = $self->{'.mod_perl'} ? $self->_mod_perl_request()->args() : $ENV{'QUERY_STRING'} || $ENV{'REDIRECT_QUERY_STRING'} || ''; $self->_parse_params($data); } else { unless ($self->{'.globals'}->{'DEBUG'} and $data = $self->read_from_cmdline()) { $self->cgi_error("400 Unknown method $method"); } } } 1;

Update: Submitted a patch to the author of CGI::Simple

Replies are listed 'Best First'.
Re: REST Webservices
by rhesa (Vicar) on Apr 12, 2007 at 12:05 UTC
    CGI can still help you. You just need to make sure that the incoming data isn't regular form data:
    HANDLING NON-URLENCODED ARGUMENTS If POSTed data is not of type application/x-www-form-urlencoded or multipart/form-data, then the POSTed data will not be processed, but instead be returned as-is in a parameter named POSTDATA. To retrieve it, use code like this: my $data = $query->param('POSTDATA'); (If you don't know what the preceding means, don't worry about it. It only affects people trying to use CGI for XML processing and other specialized tasks.)
    I've used this successfully in a CGI::Application-based REST app -- at least for text/xml POST requests (I didn't test PUT yet, but I expect it to work the same way). I can heartily recommend CGI::Application::Dispatch, as it makes building RESTy APIs easier.

    I simply added a wrapper method in my base class:

    sub get_request_body { my $self = shift; return $self->query->param('POSTDATA'); }
    My code can then choose to inflate that in any way it wants.

      Thanks rhesa but CGI is not going to handle the PUT data. From CGI where POSTDATA is set:

      if ($meth eq 'POST' && defined($ENV{'CONTENT_TYPE'}) && $ENV{'CONTENT_TYPE'} !~ m|^application/x-www-form-urlencoded| && $ENV{'CONTENT_TYPE'} !~ m|^multipart/form-data| ) { my($param) = 'POSTDATA' ;
      and further down in the comments:
      # If $meth is not of GET, POST or HEAD, assume we're being debugged of +fline. # Check the command line and then the standard input for data. # We use the shellwords package in order to behave the way that # UN*X programmers expect.
      So I still would have need to patch CGI to handle PUT and when it comes down to it the foo magic is much higher in CGI than it is in CGI::Simple. That being said, I think I should rework my code to the POST and PUT data in the PARAMs under POSTDATA and PUTDATA.

      -derby
        Thanks for digging deeper. I didn't have a need for PUT or DELETE yet (but I reckon I will in the future), so I hadn't realised the support was this limited.

        I agree that a patch to CGI is in order. Even if most web browsers don't support them, PUT and DELETE are still valid HTTP verbs, and I feel that the various CGI modules ought to support them.

        At first blush, a patch to either CGI or CGI::Simple doesn't look so difficult. I think DELETE could be handled pretty much the same way as GET, and PUT is pretty similar to POST. I may be overly optimistic, but it shouldn't take more than changing a couple of if()s ;-)

        There might be a work around, but I'm not sure, as I'm not in a position to test it right now. Try each of the following:

        • call CGI with ':no_debug' (to keep from entering the block that calls read_from_cmdline)
        • define the function 'CGI::read_from_cmdline' that doesn't actually read anything from the command line. (or put a wrapper around CGI, and put it in there)
Re: REST Webservices and CGI.pm
by ruoso (Curate) on Apr 13, 2007 at 08:19 UTC

    You probably may want to know that the later version of CGI does support XForms Model POST, both as application/xml and as multipart/related. When you do such a post, the XML is available through the query param XForms:Model. As in CGI.pm

    # Process XForms postings. We know that we have XForms in the # following cases: # method eq 'POST' && content-type eq 'application/xml' # method eq 'POST' && content-type =~ /multipart\/related.+start +=/ # There are more cases, actually, but for now, we don't support +other # methods for XForm posts. # In a XForm POST, the QUERY_STRING is parsed normally. # If the content-type is 'application/xml', we just set the para +m # XForms:Model (referring to the xml syntax) param containing th +e # unparsed XML data. # In the case of multipart/related we set XForms:Model as above, + but # the other parts are available as uploads with the Content-ID a +s the # the key. # See the URL below for XForms specs on this issue. # http://www.w3.org/TR/2006/REC-xforms-20060314/slice11.html#sub +mit-options
    daniel

      Thanks ruoso. The new version of CGI still doesn't support PUT but holy crap ... a mime type of application/xml - that's a broad stroke that's going to cause lots-o-people grief - what was the W3C thinking when they decided that -- not all xml http traffic is going to be XForms.

      -derby

      Update: Looking at RFC 3023, you would think the correct mime type for XForms should be application/xforms-xml ... but hey what do I know.

      Thanks for explaining this here. Is it documented anywhere? I have been chewing on CGI and the POSTDATA parameter for the last day, only to find out that the documentation is out-of-date. From the documentation on CPAN and perldoc:

      If POSTed data is not of type application/x-www-form-urlencoded or multipart/form-data, then the POSTed data will not be processed, but instead be returned as-is in a parameter named POSTDATA.

      ...

      (If you don't know what the preceding means, don't worry about it. It only affects people trying to use CGI for XML processing and other specialized tasks.)

      So, me, thinking I am using CGI for XML processing... Only after having it print out everything it's got, I find a param('XForms:Model') - I can't find anything about that on CPAN or perldoc for CGI though! Can someone update the doc or should I file a bug-report on CPAN?
        param('XForms:Model')

        would mean that something script sent a CGI parameter to your script with the name XForms:Model. Why would what other programs send to your script need to be documented in CGI.pm? Update: Ah - XForms:Model would be somewhat "special", I now see. If this is handled by CGI.pm, then it should be documented there indeed.

Re: REST Webservices and CGI.pm
by astroboy (Chaplain) on Apr 13, 2007 at 15:09 UTC
    A while back on the CGI::Application list there was talk of a CGI::Application::Plugin::REST module (see also RFC: CGI::Application::Plugin::REST). I don't know what became of it, but it would be nice if your work could result in some sort of C::A plugin. However, until either CGI or CGI::Simple are patched, it may not be possible.