Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Trouble with LWP, post request and XML data

by thecoder2012 (Novice)
on Jul 11, 2013 at 01:24 UTC ( #1043608=perlquestion: print w/ replies, xml ) Need Help??
thecoder2012 has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

I'm clueless with a Perl-problem.

I wrote a script with LWP::UserAgent (and IO::Socket as an alternative) that's supposed to send and receive XML-files (raw). The problem occurs with every tested version of Perl (5.10, 5.12, 5.14) and several sytesms (Windows 7, Debian Linux 7) so the version is probably not causing the issue.

With IO::Socket it works within normal parameters, according to Wireshark (1-2s).
my $inhalt; my $sock = new IO::Socket::INET ( Timeout => 25, Proto => tcp, PeerAddr => $host, PeerPort => 80 ) || die $!; binmode $sock, 'utf8'; $sock->autoflush(1); print $sock $header.$body; while(<$sock>){ $inhalt .= $_; } $sock->close();
With LWP it stops without any modification and produces the error message "X-Died: Bad chunk-size in HTTP response: <?xml version="1.0" encoding="UTF-8" ?> at C:/Perl/lib/Net/HTTP/Methods.pm line 490."
The answer contains only XML-files as well but is quickly available. According to Wireshark the problem exists within LWP/Perl.

Modification to exclude the Host in Net/HTTP/Methos.pm and LWP does what it's supposed to. Just a workaround and no direct fix. See Code (2)
my $ua = LWP::UserAgent->new; $ua->agent("xmlClient 1.0"); $ua->timeout(25); my $req = HTTP::Request->new(POST => $url); $req->content_type('text/xml'); $req->content($body); my $res = $ua->request($req); my $inhalt = $res->as_string;#$res->{_content}
And an extract from Wireshark regarding sending and receiving an answer.

Sending:
POST /xml/ HTTP/1.1 Connection: Keep-Alive, close Host: admin.notfoundtotal.de User-Agent: xmlClient 1.0 Content-Type: text/xml Content-Length: 224 <?xml version="1.0" encoding="UTF-8"?><command name="checkDomain" cust +omer="111" password="11111111111111111111111111111111" passwort="1111 +1111111111111111111111111111"><tld>de</tld><sld>super2423423423domain +</sld></command>
Answer:
HTTP/1.1 200 OK Date: Mon, 24 Jun 2013 03:54:45 GMT Server: Apache Pragma: no-cache Cache-control: no-cache Transfer-Encoding: chunked Expires: Mon, 24 Jun 2013 03:54:47 GMT Content-Length: 184 Connection: close Content-Type: text/xml <?xml version="1.0" encoding="UTF-8"?> <responses code="200" text="ok"> <response> <data>200</data> <text>available</text> <status>available</status> </response> </responses>
That's the reason why LWP is better to use, because of additional options like proxies etc.
My question is whether or how I can get LWP without changing the code in the module not to parse the body?

Maybe I'm overlooking something (e.g. options)


Methods.pm: (Net::HTTP::Methods with bad fix)
package Net::HTTP::Methods; require 5.005; # 4-arg substr use strict; use vars qw($VERSION); $VERSION = "6.00"; my $CRLF = "\015\012"; # "\r\n" is not portable *_bytes = defined(&utf8::downgrade) ? sub { unless (utf8::downgrade($_[0], 1)) { require Carp; Carp::croak("Wide character in HTTP request (bytes require +d)"); } return $_[0]; } : sub { return $_[0]; }; sub new { my $class = shift; unshift(@_, "Host") if @_ == 1; my %cnf = @_; require Symbol; my $self = bless Symbol::gensym(), $class; return $self->http_configure(\%cnf); } sub http_configure { my($self, $cnf) = @_; die "Listen option not allowed" if $cnf->{Listen}; my $explict_host = (exists $cnf->{Host}); my $host = delete $cnf->{Host}; my $peer = $cnf->{PeerAddr} || $cnf->{PeerHost}; if (!$peer) { die "No Host option provided" unless $host; $cnf->{PeerAddr} = $peer = $host; } if ($peer =~ s,:(\d+)$,,) { $cnf->{PeerPort} = int($1); # always override } if (!$cnf->{PeerPort}) { $cnf->{PeerPort} = $self->http_default_port; } if (!$explict_host) { $host = $peer; $host =~ s/:.*//; } if ($host && $host !~ /:/) { my $p = $cnf->{PeerPort}; $host .= ":$p" if $p != $self->http_default_port; } $cnf->{Proto} = 'tcp'; my $keep_alive = delete $cnf->{KeepAlive}; my $http_version = delete $cnf->{HTTPVersion}; $http_version = "1.1" unless defined $http_version; my $peer_http_version = delete $cnf->{PeerHTTPVersion}; $peer_http_version = "1.0" unless defined $peer_http_version; my $send_te = delete $cnf->{SendTE}; my $max_line_length = delete $cnf->{MaxLineLength}; $max_line_length = 8*1024 unless defined $max_line_length; my $max_header_lines = delete $cnf->{MaxHeaderLines}; $max_header_lines = 128 unless defined $max_header_lines; return undef unless $self->http_connect($cnf); $self->host($host); $self->keep_alive($keep_alive); $self->send_te($send_te); $self->http_version($http_version); $self->peer_http_version($peer_http_version); $self->max_line_length($max_line_length); $self->max_header_lines($max_header_lines); ${*$self}{'http_buf'} = ""; return $self; } sub http_default_port { 80; } # set up property accessors for my $method (qw(host keep_alive send_te max_line_length max_header_ +lines peer_http_version)) { my $prop_name = "http_" . $method; no strict 'refs'; *$method = sub { my $self = shift; my $old = ${*$self}{$prop_name}; ${*$self}{$prop_name} = shift if @_; return $old; }; } # we want this one to be a bit smarter sub http_version { my $self = shift; my $old = ${*$self}{'http_version'}; if (@_) { my $v = shift; $v = "1.0" if $v eq "1"; # float unless ($v eq "1.0" or $v eq "1.1") { require Carp; Carp::croak("Unsupported HTTP version '$v'"); } ${*$self}{'http_version'} = $v; } $old; } sub format_request { my $self = shift; my $method = shift; my $uri = shift; my $content = (@_ % 2) ? pop : ""; for ($method, $uri) { require Carp; Carp::croak("Bad method or uri") if /\s/ || !length; } push(@{${*$self}{'http_request_method'}}, $method); my $ver = ${*$self}{'http_version'}; my $peer_ver = ${*$self}{'http_peer_http_version'} || "1.0"; my @h; my @connection; my %given = (host => 0, "content-length" => 0, "te" => 0); while (@_) { my($k, $v) = splice(@_, 0, 2); my $lc_k = lc($k); if ($lc_k eq "connection") { $v =~ s/^\s+//; $v =~ s/\s+$//; push(@connection, split(/\s*,\s*/, $v)); next; } if (exists $given{$lc_k}) { $given{$lc_k}++; } push(@h, "$k: $v"); } if (length($content) && !$given{'content-length'}) { push(@h, "Content-Length: " . length($content)); } my @h2; if ($given{te}) { push(@connection, "TE") unless grep lc($_) eq "te", @connection; } elsif ($self->send_te && gunzip_ok()) { # gzip is less wanted since the IO::Uncompress::Gunzip interface f +or # it does not really allow chunked decoding to take place easily. push(@h2, "TE: deflate,gzip;q=0.3"); push(@connection, "TE"); } unless (grep lc($_) eq "close", @connection) { if ($self->keep_alive) { if ($peer_ver eq "1.0") { # from looking at Netscape's headers push(@h2, "Keep-Alive: 300"); unshift(@connection, "Keep-Alive"); } } else { push(@connection, "close") if $ver ge "1.1"; } } push(@h2, "Connection: " . join(", ", @connection)) if @connection +; unless ($given{host}) { my $h = ${*$self}{'http_host'}; push(@h2, "Host: $h") if $h; } return _bytes(join($CRLF, "$method $uri HTTP/$ver", @h2, @h, "", $ +content)); } sub write_request { my $self = shift; $self->print($self->format_request(@_)); } sub format_chunk { my $self = shift; return $_[0] unless defined($_[0]) && length($_[0]); return _bytes(sprintf("%x", length($_[0])) . $CRLF . $_[0] . $CRLF +); } sub write_chunk { my $self = shift; return 1 unless defined($_[0]) && length($_[0]); $self->print(_bytes(sprintf("%x", length($_[0])) . $CRLF . $_[0] . + $CRLF)); } sub format_chunk_eof { my $self = shift; my @h; while (@_) { push(@h, sprintf "%s: %s$CRLF", splice(@_, 0, 2)); } return _bytes(join("", "0$CRLF", @h, $CRLF)); } sub write_chunk_eof { my $self = shift; $self->print($self->format_chunk_eof(@_)); } sub my_read { die if @_ > 3; my $self = shift; my $len = $_[1]; for (${*$self}{'http_buf'}) { if (length) { $_[0] = substr($_, 0, $len, ""); return length($_[0]); } else { return $self->sysread($_[0], $len); } } } sub my_readline { my $self = shift; my $what = shift; for (${*$self}{'http_buf'}) { my $max_line_length = ${*$self}{'http_max_line_length'}; my $pos; while (1) { # find line ending $pos = index($_, "\012"); last if $pos >= 0; die "$what line too long (limit is $max_line_length)" if $max_line_length && length($_) > $max_line_length; # need to read more data to find a line ending READ: { my $n = $self->sysread($_, 1024, length); unless (defined $n) { redo READ if $!{EINTR}; if ($!{EAGAIN}) { # Hmm, we must be reading from a non-blocking +socket # XXX Should really wait until this socket is +readable,... select(undef, undef, undef, 0.1); # but this +will do for now redo READ; } # if we have already accumulated some data let's a +t least # return that as a line die "$what read failed: $!" unless length; } unless ($n) { return undef unless length; return substr($_, 0, length, ""); } } } die "$what line too long ($pos; limit is $max_line_length)" if $max_line_length && $pos > $max_line_length; my $line = substr($_, 0, $pos+1, ""); $line =~ s/(\015?\012)\z// || die "Assert"; return wantarray ? ($line, $1) : $line; } } sub _rbuf { my $self = shift; if (@_) { for (${*$self}{'http_buf'}) { my $old; $old = $_ if defined wantarray; $_ = shift; return $old; } } else { return ${*$self}{'http_buf'}; } } sub _rbuf_length { my $self = shift; return length ${*$self}{'http_buf'}; } sub _read_header_lines { my $self = shift; my $junk_out = shift; my @headers; my $line_count = 0; my $max_header_lines = ${*$self}{'http_max_header_lines'}; while (my $line = my_readline($self, 'Header')) { if ($line =~ /^(\S+?)\s*:\s*(.*)/s) { push(@headers, $1, $2); } elsif (@headers && $line =~ s/^\s+//) { $headers[-1] .= " " . $line; } elsif ($junk_out) { push(@$junk_out, $line); } else { die "Bad header: '$line'\n"; } if ($max_header_lines) { $line_count++; if ($line_count >= $max_header_lines) { die "Too many header lines (limit is $max_header_lines)"; } } } return @headers; } sub read_response_headers { my($self, %opt) = @_; my $laxed = $opt{laxed}; my($status, $eol) = my_readline($self, 'Status'); unless (defined $status) { die "Server closed connection without sending any data back"; } my($peer_ver, $code, $message) = split(/\s+/, $status, 3); if (!$peer_ver || $peer_ver !~ s,^HTTP/,, || $code !~ /^[1-5]\d\d$ +/) { die "Bad response status line: '$status'" unless $laxed; # assume HTTP/0.9 ${*$self}{'http_peer_http_version'} = "0.9"; ${*$self}{'http_status'} = "200"; substr(${*$self}{'http_buf'}, 0, 0) = $status . ($eol || ""); return 200 unless wantarray; return (200, "Assumed OK"); }; ${*$self}{'http_peer_http_version'} = $peer_ver; ${*$self}{'http_status'} = $code; my $junk_out; if ($laxed) { $junk_out = $opt{junk_out} || []; } my @headers = $self->_read_header_lines($junk_out); # pick out headers that read_entity_body might need my @te; my $content_length; for (my $i = 0; $i < @headers; $i += 2) { my $h = lc($headers[$i]); if ($h eq 'transfer-encoding') { my $te = $headers[$i+1]; $te =~ s/^\s+//; $te =~ s/\s+$//; push(@te, $te) if length($te); } elsif ($h eq 'content-length') { # ignore bogus and overflow values if ($headers[$i+1] =~ /^\s*(\d{1,15})(?:\s|$)/) { $content_length = $1; } } } ${*$self}{'http_te'} = join(",", @te); ${*$self}{'http_content_length'} = $content_length; ${*$self}{'http_first_body'}++; delete ${*$self}{'http_trailers'}; return $code unless wantarray; return ($code, $message, @headers); } sub read_entity_body { my $self = shift; my $buf_ref = \$_[0]; my $size = $_[1]; die "Offset not supported yet" if $_[2]; my $chunked; my $bytes; if (${*$self}{'http_first_body'} && ${*$self}{'http_host'} ne "adm +in.notfoundtotal.de") { ${*$self}{'http_first_body'} = 0; delete ${*$self}{'http_chunked'}; delete ${*$self}{'http_bytes'}; my $method = shift(@{${*$self}{'http_request_method'}}); my $status = ${*$self}{'http_status'}; if ($method eq "HEAD") { # this response is always empty regardless of other headers $bytes = 0; } elsif (my $te = ${*$self}{'http_te'}) { my @te = split(/\s*,\s*/, lc($te)); die "Chunked must be last Transfer-Encoding '$te'" unless pop(@te) eq "chunked"; for (@te) { if ($_ eq "deflate" && inflate_ok()) { #require Compress::Raw::Zlib; my ($i, $status) = Compress::Raw::Zlib::Inflate->new(); die "Can't make inflator: $status" unless $i; $_ = sub { my $out; $i->inflate($_[0], \$out); $out } } elsif ($_ eq "gzip" && gunzip_ok()) { #require IO::Uncompress::Gunzip; my @buf; $_ = sub { push(@buf, $_[0]); return "" unless $_[1]; my $input = join("", @buf); my $output; IO::Uncompress::Gunzip::gunzip(\$input, \$output, Transpar +ent => 0) or die "Can't gunzip content: $IO::Uncompress::Gunzip: +:GunzipError"; return \$output; }; } elsif ($_ eq "identity") { $_ = sub { $_[0] }; } else { die "Can't handle transfer encoding '$te'"; } } @te = reverse(@te); ${*$self}{'http_te2'} = @te ? \@te : ""; $chunked = -1; } elsif (defined(my $content_length = ${*$self}{'http_content_length +'})) { $bytes = $content_length; } elsif ($status =~ /^(?:1|[23]04)/) { # RFC 2616 says that these responses should always be empt +y # but that does not appear to be true in practice [RT#1790 +7] $bytes = 0; } else { # XXX Multi-Part types are self delimiting, but RFC 2616 says +we # only has to deal with 'multipart/byteranges' # Read until EOF } } else { $chunked = ${*$self}{'http_chunked'}; $bytes = ${*$self}{'http_bytes'}; } if (defined $chunked) { # The state encoded in $chunked is: # $chunked == 0: read CRLF after chunk, then chunk header # $chunked == -1: read chunk header # $chunked > 0: bytes left in current chunk to read if ($chunked <= 0) { my $line = my_readline($self, 'Entity body'); if ($chunked == 0) { die "Missing newline after chunk data: '$line'" if !defined($line) || $line ne ""; $line = my_readline($self, 'Entity body'); } die "EOF when chunk header expected" unless defined($line); my $chunk_len = $line; $chunk_len =~ s/;.*//; # ignore potential chunk parameters unless ($chunk_len =~ /^([\da-fA-F]+)\s*$/) { die "Bad chunk-size in HTTP response: $line"; } $chunked = hex($1); if ($chunked == 0) { ${*$self}{'http_trailers'} = [$self->_read_header_lines]; $$buf_ref = ""; my $n = 0; if (my $transforms = delete ${*$self}{'http_te2'}) { for (@$transforms) { $$buf_ref = &$_($$buf_ref, 1); } $n = length($$buf_ref); } # in case somebody tries to read more, make sure we continue # to return EOF delete ${*$self}{'http_chunked'}; ${*$self}{'http_bytes'} = 0; return $n; } } my $n = $chunked; $n = $size if $size && $size < $n; $n = my_read($self, $$buf_ref, $n); return undef unless defined $n; ${*$self}{'http_chunked'} = $chunked - $n; if ($n > 0) { if (my $transforms = ${*$self}{'http_te2'}) { for (@$transforms) { $$buf_ref = &$_($$buf_ref, 0); } $n = length($$buf_ref); $n = -1 if $n == 0; } } return $n; } elsif (defined $bytes) { unless ($bytes) { $$buf_ref = ""; return 0; } my $n = $bytes; $n = $size if $size && $size < $n; $n = my_read($self, $$buf_ref, $n); return undef unless defined $n; ${*$self}{'http_bytes'} = $bytes - $n; return $n; } else { # read until eof $size ||= 8*1024; return my_read($self, $$buf_ref, $size); } } sub get_trailers { my $self = shift; @{${*$self}{'http_trailers'} || []}; } BEGIN { my $gunzip_ok; my $inflate_ok; sub gunzip_ok { return $gunzip_ok if defined $gunzip_ok; # Try to load IO::Uncompress::Gunzip. local $@; local $SIG{__DIE__}; $gunzip_ok = 0; eval { require IO::Uncompress::Gunzip; $gunzip_ok++; }; return $gunzip_ok; } sub inflate_ok { return $inflate_ok if defined $inflate_ok; # Try to load Compress::Raw::Zlib. local $@; local $SIG{__DIE__}; $inflate_ok = 0; eval { require Compress::Raw::Zlib; $inflate_ok++; }; return $inflate_ok; } } # BEGIN 1;

Comment on Trouble with LWP, post request and XML data
Select or Download Code
Re: Trouble with LWP, post request and XML data (cpan GAAS/Net-HTTP-6.06.tar.gz)
by Anonymous Monk on Jul 11, 2013 at 06:54 UTC
    Try upgrading to  cpan GAAS/Net-HTTP-6.06.tar.gz
Re: Trouble with LWP, post request and XML data
by tobyink (Abbot) on Jul 11, 2013 at 08:04 UTC

    The nameless monk is correct in that the Net-HTTP changelog does list several changes with regard to chunked encoding in 6.04, so upgrading to the latest version (currently 6.06) might help you out.

    Your problem may stem from the fact that the HTTP response you gave included this header:

    Transfer-Encoding: chunked

    However, the body of the response is not formatted as a chunked encoding. In chunked encodings, the first line (and various subsequent lines, but I won't go into the details) is expected to be an integer, giving the size of a chunk in bytes. However, the first line of the response body is:

    <?xml version="1.0" encoding="UTF-8"?>

    Hence the message, "Bad chunk-size in HTTP response".

    So the HTTP response is broken. If you have the option of fixing it at that end, that's what you should do. If you can't fix it at that end, then there are a couple of options: patch LWP to cope with the broken HTTP response (and you might get this accepted upstream if you can show it's a common problem, and your patch doesn't harm non-broken HTTP responses); or set up a proxy that fixes broken chunked encoding and make all your requests via the proxy.

    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1043608]
Approved by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (16)
As of 2014-07-30 13:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (232 votes), past polls