Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Dealing with binary data and WWW::Mechanize and encoding stuff

by friedo (Prior)
on Dec 07, 2008 at 08:38 UTC ( #728674=perlquestion: print w/ replies, xml ) Need Help??
friedo has asked for the wisdom of the Perl Monks concerning the following question:

Greetings!

I've run into some confusion with an encoding issue. I'm fetching some files with Mechanize (PDF's in particular) and passing them in-memory to another function, in this case Compress::Zlib::memGzip.

use Compress::Zlib; ... $mech->get( $pdf_url ); my $compressed = Compress::Zlib::memGzip( $mech->content );
I'm getting the dreaded "wide character in memGzip" warning when I do this, which, if my understanding is correct, tells me a few things:
  • Mechanize (or somebody) is storing the PDF data as a character string
  • Since PDF is a byte format, I really don't want it in a character string
  • memGzip doesn't want a character string either
  • I have to somehow make it not a character string
And that's where I'm lost. I know how to convert character data to various encodings using Encode, and how to set binmode on a filehandle, but I can't seem to work out how to get that PDF data in the format it should be in.

Mech does have a save_content method which promises to save the content in binary mode if it's not a text/* MIME type (and I've checked that the MIME type is correct.) However, I'd hate to have to dump the content to a temp file just to read it in again.

Comment on Dealing with binary data and WWW::Mechanize and encoding stuff
Select or Download Code
Replies are listed 'Best First'.
Re: Dealing with binary data and WWW::Mechanize and encoding stuff
by davidrw (Prior) on Dec 07, 2008 at 15:19 UTC
    Taking advantage of LWP::UserAgent's "Handlers" (and WWW::Mechanize is a proper subclass of LWP::UserAgent) might give you better access to the content. And using IO::Compress::Gzip for better dealing with the content a piece at a time.
    use IO::Compress::Gzip; my $compressed; my $z; $mech->add_handler( response_header => sub { my($response, $ua, $h) = @_; $response->{default_add_content} = 0; $z = new IO::Compress::Gzip \$compressed or die; } ); $mech->add_handler( response_data => sub { my($response, $ua, $h, $data) = @_; print $z $data or die $!; return 1; } ); $mech->add_handler( response_done => sub { my($response, $ua, $h) = @_; close $z or die $!; } ); $mech->get($pdf_url); warn length $mech->content; # 0 cause of the 'default_add_content' se +tting warn length $compressed;
    Note that $z->print() and $z->close() work too.
    Note that LWP::UserAgent has a remove_handler method, too, in case this $mech object has to go do other stuff.


    Also, does IO::Compress::Gzip::gzip handle it any better directly from $mech->content?

    (side note) Can (potentially) save a mem copy of the content by instead passing $mech->content_ref to Compress::Zlib::memGzip

      Cool. I had forgotten about LWP::UserAgent's callback features. (In fact, there's all sorts of goodies in LWP::UA that one doesn't remember if one looks only at the Mech docs.)
Re: Dealing with binary data and WWW::Mechanize and encoding stuff
by Anonymous Monk on Dec 07, 2008 at 08:44 UTC

      That was the clue I needed. The underlying content method from HTTP::Message will get me the original unmolested byte string from the server. So all I have to do is use $mech->response->content instead of $mech->content, and it works perfectly.

      Thanks!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://728674]
Approved by marto
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (12)
As of 2015-07-28 22:54 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (260 votes), past polls