Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

scraping flash content

by revendar (Novice)
on Dec 17, 2013 at 21:44 UTC ( #1067536=perlquestion: print w/replies, xml ) Need Help??
revendar has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm trying to scrap a flash enabled website( I'm trying to get the availability of seats from flash screen. I have found http link which the flash app uses using MozRepl in firefox. But a part of xml which I get is encrypted. Is it due to gzip format?? I'm pasting the code here. Please enlighten me. Thanks

#!C:\Perl\bin\perl use LWP::UserAgent; use strict; use Data::Dumper; use Date::Simple qw/ date today /; use XML::SImple; use Compress::Zlib; local $SIG{__WARN__} = sub { }; my $ua = LWP::UserAgent->new; $ua->agent( "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv: Gecko/201004 +01 Firefox/3.6.3 (FM Scene 4.6.1)" ); my $xs = XML::Simple->new(); my $url = " +C779A"; my $response = $ua->get($url); if ( $response->is_success ) { print $response->decoded_content; my $ref = $xs->XMLin( $response->decoded_content ); #open FILE, ">f.gz"; # binmode(FILE); # print FILE $ref->{ev_comp}; # or whatever # close FILE; print Compress::Zlib::memGunzip($ref->{ev_comp}); } else { #die $response->status_line; print "\t[FAILURE]\n"; }

Replies are listed 'Best First'.
Re: scraping flash content
by GrandFather (Sage) on Dec 17, 2013 at 23:56 UTC

    That site explicitly excludes what you are trying to do in their Terms and Conditions.

    True laziness is hard work
Re: scraping flash content
by davido (Archbishop) on Dec 18, 2013 at 00:31 UTC

    Furthermore, that company is a Perl shop, and has even hosted Los Angeles Perl Mongers on occasion. If there's anything legitimate you would like to do involving them, get in touch with them and work it out together.


Re: scraping flash content
by Anonymous Monk on Dec 18, 2013 at 05:46 UTC
    Ignoring arguing about gun control in a thread where the poster wants to know how to load bullets. Let me continue with civil monastery behavior instead of berating. To decompress a HTTP body of gzip content, I suggest you to try
    $zlib = new Compress::Raw::Zlib::Inflate( -WindowBits => WANT_GZIP, -C +onsumeInput => 1 ); #LATER sub onData { my ($comp_in, $output, $status) = $_[0]; $status = $zlib->inflateReset() if $zlib->status() == Z_STREAM_EN +D; die "failed" if $status != Z_OK; $status = $zlib->inflate($comp_in, $output); die "failed" if $status != Z_OK && $status != Z_STREAM_END ; print $output; }
    Of interest to you, may I suggest using Accept-Encoding, to disable gzip encoding on the HTTP compliant server? Although I must advise, for a bot, as you are writing, it is a poor use of your broadband connection, to transfer uncompressed data over it. It shall reduce the throughput and increase latency of your bot. I also suggest for you to research HEAD verb, to further reduce your bot's burden on the server and make greater economical use of your link and your processor.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1067536]
Approved by ww
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (5)
As of 2017-11-21 07:34 GMT
Find Nodes?
    Voting Booth?
    In order to be able to say "I know Perl", you must have:

    Results (297 votes). Check out past polls.