Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

scraping flash content

by revendar (Novice)
on Dec 17, 2013 at 21:44 UTC ( #1067536=perlquestion: print w/replies, xml ) Need Help??
revendar has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm trying to scrap a flash enabled website( I'm trying to get the availability of seats from flash screen. I have found http link which the flash app uses using MozRepl in firefox. But a part of xml which I get is encrypted. Is it due to gzip format?? I'm pasting the code here. Please enlighten me. Thanks

#!C:\Perl\bin\perl use LWP::UserAgent; use strict; use Data::Dumper; use Date::Simple qw/ date today /; use XML::SImple; use Compress::Zlib; local $SIG{__WARN__} = sub { }; my $ua = LWP::UserAgent->new; $ua->agent( "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv: Gecko/201004 +01 Firefox/3.6.3 (FM Scene 4.6.1)" ); my $xs = XML::Simple->new(); my $url = " +C779A"; my $response = $ua->get($url); if ( $response->is_success ) { print $response->decoded_content; my $ref = $xs->XMLin( $response->decoded_content ); #open FILE, ">f.gz"; # binmode(FILE); # print FILE $ref->{ev_comp}; # or whatever # close FILE; print Compress::Zlib::memGunzip($ref->{ev_comp}); } else { #die $response->status_line; print "\t[FAILURE]\n"; }

Replies are listed 'Best First'.
Re: scraping flash content
by GrandFather (Sage) on Dec 17, 2013 at 23:56 UTC

    That site explicitly excludes what you are trying to do in their Terms and Conditions.

    True laziness is hard work
Re: scraping flash content
by davido (Archbishop) on Dec 18, 2013 at 00:31 UTC

    Furthermore, that company is a Perl shop, and has even hosted Los Angeles Perl Mongers on occasion. If there's anything legitimate you would like to do involving them, get in touch with them and work it out together.


Re: scraping flash content
by Anonymous Monk on Dec 18, 2013 at 05:46 UTC
    Ignoring arguing about gun control in a thread where the poster wants to know how to load bullets. Let me continue with civil monastery behavior instead of berating. To decompress a HTTP body of gzip content, I suggest you to try
    $zlib = new Compress::Raw::Zlib::Inflate( -WindowBits => WANT_GZIP, -C +onsumeInput => 1 ); #LATER sub onData { my ($comp_in, $output, $status) = $_[0]; $status = $zlib->inflateReset() if $zlib->status() == Z_STREAM_EN +D; die "failed" if $status != Z_OK; $status = $zlib->inflate($comp_in, $output); die "failed" if $status != Z_OK && $status != Z_STREAM_END ; print $output; }
    Of interest to you, may I suggest using Accept-Encoding, to disable gzip encoding on the HTTP compliant server? Although I must advise, for a bot, as you are writing, it is a poor use of your broadband connection, to transfer uncompressed data over it. It shall reduce the throughput and increase latency of your bot. I also suggest for you to research HEAD verb, to further reduce your bot's burden on the server and make greater economical use of your link and your processor.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1067536]
Approved by ww
[Lady_Aleena]: shmem, that is understandable! The two examples in File::Find don't make sense to me on a quick glance.
[marioroy]: LA if the find worked from Unix command line, or does it not. Likely a quoting issue inside Perl qx.
[Lady_Aleena]: marioroy, the find worked fine at the command line.
[marioroy]: LA, yeah. than there's no reason why it cannot work inside qx. But chatting is hard in PM. I cannot see the code now.
[shmem]: Lady_Aleena: sometimes a quick glance isn't enough.

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2017-04-23 20:58 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (432 votes). Check out past polls.