Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask
 
PerlMonks  

Re^4: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize

by Your Mother (Archbishop)
on Dec 19, 2018 at 16:41 UTC ( [id://1227471]=note: print w/replies, xml ) Need Help??


in reply to Re^3: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
in thread Solved: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize

Thanks again for thinking about it at all, out loud or otherwise. :P

The 32kB is just something I saw somewhere about gzip streams. I don't remember where, I probably shouldn't have mentioned it.

If I do this (assume proper var scoping)–

gunzip \$data => \$out; print $out, $/;

–it will display something like–

<status>connected</status> ?R??0 ????l??????@? +U?&#1964;??/?%y???p???v?Po#[???-???x? >\'&#1000;??4'?V.6?6?&#1444;~5Y???0???C]?$?@m~OgQ?u&#451;8?Y?E?8<?Le?4 +?6??&#1644;&qd?x#1

Amended to–

$collected .= $data; gunzip \$collected, \$out; print $out, $/;

We get (it's ignoring the Accept and returning XML)–

<status>connected</status> <quote> <ask>166.29</ask> <asksz>500</asksz> <bid>166.26</bid> … </quote> ...

And then dies after awhile, it's inconsistent where but never sooner than 5kB in, with an "unexpected end" style message.

Adding this lets it run for—maybe, I didn't let it run that long—forever, but it's still stacking up an ever growing scalar and gunzipping the same data over and over–

$collected .= $data; gunzip \$collected, \$out, MultiStream => 1; print $out, $/;

I expect I will have to come up with seek/tell/truncate kind of solution to keep the data from growing forever that uses the MultiStream to reset itself automatically. Haven't had time to go back to it. I feel like this must be a solved problem and I'm just looking in the wrong place. :|

Replies are listed 'Best First'.
Re^5: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
by pmqs (Friar) on Dec 19, 2018 at 17:17 UTC
    If you have concatenated the chunks received & could uncompress the composite buffer, it sounds like the sub that gets triggered in the add_header is being passed a part of the same gzipped data stream every time it is invoked. You can push the compressed data a buffer at a time through Compress::Zlib. Something like this
    use Compress::Zlib; my $gunzip = inflateInit(WindowBits => 16 + MAX_WBITS) or die "Cannot create a inflation stream\n" ; ... $mech->add_handler( response_data => sub { my ( $response, $ua, $h, $data ) = @_; my ($buffer, $status) = $gunzip->inflate($data); # uncompressed data in $buffer # return true to get called again for same response. 1; } ;
Re^5: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
by vr (Curate) on Dec 19, 2018 at 19:32 UTC

    Sorry if I completely misunderstood the problem, but won't the following work?

    use strict; use warnings; use feature 'say'; use IO::Compress::Gzip 'gzip'; use IO::Uncompress::Gunzip qw/ gunzip $GunzipError /; my $s = <<'END'; I include only the bare bones because I tried something like 20 differ +ent things without success and I'm embarrassed. :( Non-streaming requ +ests are working perfectly with approximately this code. The endpoint + for this code is a END gzip( \$s, \my $c ); my @chunks = unpack '(a42)*', $c x 5; my $partial = ''; my $result = ''; my $n = 1; for ( @chunks ) { gunzip( \( $partial . $_ ), \my $o, Transparent => 0, TrailingData + => my $t ); $partial .= $_ and next if $GunzipError; $partial = $t ? $t : ''; print "message #", $n ++, "\n$o"; }

      It depends. The code in your test script assumes that the input consists of 5 completely distinct gzip data streams. So they will each contain the gzip header, the compressed payload and gzip trailer data. If that is what is actually happening with the WWW::Mechanize application, and the gzip data streams aren't that big, then your approach should be fine.

      I'm not convinced that is what is happening in the real application though. The snippet of code below, from earlier, along with the observation that uncompressing $collected resulted in more of the real uncompresed payload data suggests that this is a single gzip data stream

      $collected .= $data; gunzip \$collected, \$out; print $out, $/;

      If that is the case then IO::Uncompress::Gunzip will only work if you are prepared to read the entire compressed data stream and uncompress the lot in one go. If we are dealing with a potentially infinite compressed data stream, that isn't going to work.

      The code I posted that uses Compress::Zlib will uncompresses the data as it gets it, one chunk at a time.

        I see, thanks. Can you explain, when it's possible that the output of "gunzip" is valid (but partial, truncated) uncompressed data plus obviously binary, still compressed "tail", as in Re^4: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize. I couldn't get such result regardless of "Transparent" and all other parameters -- always uncompressed partial data only, instead.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1227471]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (3)
As of 2024-04-25 06:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found