Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

Solved: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize

by Your Mother (Bishop)
on Dec 18, 2018 at 22:15 UTC ( #1227435=perlquestion: print w/replies, xml ) Need Help??

Your Mother has asked for the wisdom of the Perl Monks concerning the following question:

Update: New and cleaned up code posted as a CUFP in Streaming Market Quotes from Ally Invest.

I include only the bare bones because I tried something like 20 different things without success and I'm embarrassed. :( Non-streaming requests are working perfectly with approximately this code. The endpoint for this code is a streaming quote and trade ticker. The URL shown just asks for the info for AAPL. The compression—well, the ongoing, memory sane handling of it—is the only problem I have.

The # $data is where the handling, or a core part of it, is missing.

The closest I get streaming to work is gunzip from IO::Uncompress::Gunzip. It works fine but it's a standalone conversion which is goofy/impossible for an "infinite" stream of data; append, gunzip, append, gunzip… It must be uncompressed in toto. So a non-starter but I could see it works fine at least. Whatever the solution, I guess it will require IO handle truncation as it goes.

I played with opening a pipe to gunzip(1), I tried every variation of Compress::Zlib I could think but wasn't able to get any data out or apparently even write/uncompress the data properly. I tried everything I could figure out with IO::Uncompress::Gunzip too. A few PM and SO nodes looked promising but I didn't find anything that seemed to apply directly or fix it. I even tried PerlIO::gzip. I'm just stick stupid on this. Maybe one of the attempts was close but I have no idea which, and, again, I'm embarrassed to share any of the *many*.

The doc page https://www.ally.com/api/invest/documentation/streaming/. I tried to read the sample streaming code for other languages but there is no gzip handling surfaced in them. If they work, it's hidden in the libraries used.

#!/usr/bin/env perl use strictures; no warnings "uninitialized"; use WWW::Mechanize; use WWW::OAuth; # You have to have a trading account to use the service- my $oauth = WWW::OAuth->new( client_id => "...", client_secret => "...", token => "...", token_secret => "..." ); my $mech = WWW::Mechanize->new( autocheck => undef ); $mech->add_handler( request_prepare => sub { $oauth->authenticate($_[0 +]) } ); $mech->default_header( Accept => "application/json" ); $mech->add_handler( response_data => sub { my ( $response, $ua, $h, $data ) = @_; # $data; # Handle the gzip data. $response->content(undef); 1; }); $mech->get("https://stream.tradeking.com/v1/market/quotes?symbols=AAPL +");

On the plus side, the WWW::OAuth was trivial to mix in and works great for this.

Thanks for looking!

Update: the solution from pmqs with full testing code from haukex works perfectly. Thank you so much to them and bliako and kschwab and vr. Always proud of the monks; today also grateful.

Caveat for anyone doing the same kind of thing, the $mech response accumulates so you still have to clear it out with $response->content(undef) or it slowly, perpetually grows…

Replies are listed 'Best First'.
Re: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
by bliako (Vicar) on Dec 19, 2018 at 01:53 UTC

    Could it be that the compressed stream you are receiving is a sequence of self-contained compressed chunks of data? In which case, IO::Uncompress::Gunzip can detect end of compressed chunk and reset, see "An advanced tip" in https://www.perl.com/article/162/2015/3/27/Gzipping-data-directly-from-Perl/ (by brian d foy) re: MultiStream

    On a side note, do infinite streams of ([g]zip) compressed data exist?

    bw, bliako

      "On a side note, do infinite streams of (gzip) compressed data exist?"
      If you need one:
      $ gzip -f - </dev/urandom

        Thanks! Second question: does than unzip (before the end of time)?

      Thanks for looking. It's not self-contained chunks; immediately at least. I can gunzip the first chunk in isolation fine and then next is gibberish. But if I concat the first and second, they gunzip fine. Probably there is some point, some bigger chunk, where it starts over as you suggest; IIRC there was a mention of 32kB somewhere. I tried the MultiStream settings, and other options, in my many experiments. I was definitely doing something wrong though. I'll dig back in.

        I would investigate what "gibberish" is, and whether Gunzip fails on that data or it does uncompresses it but what you get is "gibberish". If gunzip does not fail then it is possible to be sometimes have zip-inside-zip.

        So, they have a logical chunk of data, based on the XML I saw in their page <quote>...</quote> and then they have a logical chunk of compressed data of 32kB? Isn't that weird? I mean they compress 5 chunks of data and sometimes it is 32kB and sometimes it is 33kB depending what content they have. How can they always send 32kB and expect the recipient to get exactly 5 chunks of data? Unless they send sometimes 4 chunks, sometimes 5 and most times something fractional in-between. And if they do send something fractional, isn't it weird to cause you to waste time waiting for the remaining half chunk to appear (whenever the 32kB limit of the next chunk is filled)? You have something like "IBM up 2<end of chunk sorry>" and then you wait a few valuable seconds for the next chunk to find out if it is 2000 points or 2.4 points up!

        They can also do it if they pad of course but what's the point for all this computational burden on their side and forcing the client to wait till 32kB of compressed data have been completed before knowing where the market goes?

        Just thinking out loud...

Re: Uncompress streaming gzip on the fly in LWP::UserAgent/WWW::Mechanize
by haukex (Chancellor) on Dec 20, 2018 at 12:36 UTC

    Since I don't have access to that site, I couldn't test at first, but I whipped up the following test server which I hope emulates the server you're accessing, and it seems that the solution provided by pmqs here works. The only issue I'm still having is that the last line is cut off, probably some interaction with WWW::Mechanize, I haven't checked it out yet. First the server (run e.g. via plackup):

    use warnings; use strict; use IO::Compress::Gzip qw/$GzipError Z_PARTIAL_FLUSH/; my $app = sub { my $env = shift; die "This app needs a server that supports psgi.streaming" unless $env->{'psgi.streaming'}; die "The client did not send the 'Accept-Encoding: gzip' header" unless defined $env->{HTTP_ACCEPT_ENCODING} && $env->{HTTP_ACCEPT_ENCODING} =~ /\bgzip\b/; # Note some browsers don't correctly support gzip correctly, # see e.g. https://metacpan.org/pod/Plack::Middleware::Deflater # but we're not checking that here (and we don't set the Vary head +er) return sub { my $respond = shift; my $zipped; my $z = IO::Compress::Gzip->new(\$zipped) or die "IO::Compress::Gzip: $GzipError"; my $w = $respond->([ 200, [ 'Content-Type' => 'text/plain; charset=ascii', 'Content-Encoding' => 'gzip', ] ]); for (1..10) { $z->print("Hello, it is ".gmtime." GMT\n"); $z->flush(Z_PARTIAL_FLUSH); $w->write($zipped) if defined $zipped; $zipped = undef; sleep 1; } $z->print("Goodbye!\n"); $z->close; $w->write($zipped) if defined $zipped; $w->close; }; };

    And the client:

    use warnings; use strict; use Data::Dump; use WWW::Mechanize; use Compress::Zlib; my $gunzip = inflateInit(WindowBits => 16 + MAX_WBITS) or die "Cannot create a inflation stream\n"; my $mech = WWW::Mechanize->new(); $mech->add_handler( response_data => sub { my ( $response, $ua, $h, $data ) = @_; my ($buffer, $status) = $gunzip->inflate($data); die "zlib error: $status" if length $status; dd $buffer; 1; }); $mech->get('http://localhost:5000');

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1227435]
Front-paged by stevieb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (7)
As of 2019-10-20 14:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Notices?