Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

scraping flash content

by revendar (Novice)
on Dec 17, 2013 at 21:44 UTC ( #1067536=perlquestion: print w/ replies, xml ) Need Help??
revendar has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm trying to scrap a flash enabled website(http://concerts.livenation.com/event/04004B76DB4C779A?tm_link=search_msg-0_04004B76DB4C779A&crosssite=TM_US:735415:32837).Hera I'm trying to get the availability of seats from flash screen. I have found http link which the flash app uses using MozRepl in firefox. But a part of xml which I get is encrypted. Is it due to gzip format?? I'm pasting the code here. Please enlighten me. Thanks

#!C:\Perl\bin\perl use LWP::UserAgent; use strict; use Data::Dumper; use Date::Simple qw/ date today /; use XML::SImple; use Compress::Zlib; local $SIG{__WARN__} = sub { }; my $ua = LWP::UserAgent->new; $ua->agent( "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/201004 +01 Firefox/3.6.3 (FM Scene 4.6.1)" ); my $xs = XML::Simple->new(); my $url = "http://concerts.livenation.com/app/availability/04004B76DB4 +C779A"; my $response = $ua->get($url); if ( $response->is_success ) { print $response->decoded_content; my $ref = $xs->XMLin( $response->decoded_content ); #open FILE, ">f.gz"; # binmode(FILE); # print FILE $ref->{ev_comp}; # or whatever # close FILE; print Compress::Zlib::memGunzip($ref->{ev_comp}); } else { #die $response->status_line; print "\t[FAILURE]\n"; }

Comment on scraping flash content
Download Code
Re: scraping flash content
by GrandFather (Cardinal) on Dec 17, 2013 at 23:56 UTC

    That site explicitly excludes what you are trying to do in their Terms and Conditions.

    True laziness is hard work
Re: scraping flash content
by davido (Archbishop) on Dec 18, 2013 at 00:31 UTC

    Furthermore, that company is a Perl shop, and has even hosted Los Angeles Perl Mongers on occasion. If there's anything legitimate you would like to do involving them, get in touch with them and work it out together.


    Dave

Re: scraping flash content
by Anonymous Monk on Dec 18, 2013 at 05:46 UTC
    Ignoring arguing about gun control in a thread where the poster wants to know how to load bullets. Let me continue with civil monastery behavior instead of berating. To decompress a HTTP body of gzip content, I suggest you to try
    $zlib = new Compress::Raw::Zlib::Inflate( -WindowBits => WANT_GZIP, -C +onsumeInput => 1 ); #LATER sub onData { my ($comp_in, $output, $status) = $_[0]; $status = $zlib->inflateReset() if $zlib->status() == Z_STREAM_EN +D; die "failed" if $status != Z_OK; $status = $zlib->inflate($comp_in, $output); die "failed" if $status != Z_OK && $status != Z_STREAM_END ; print $output; }
    Of interest to you, may I suggest using Accept-Encoding, to disable gzip encoding on the HTTP compliant server? Although I must advise, for a bot, as you are writing, it is a poor use of your broadband connection, to transfer uncompressed data over it. It shall reduce the throughput and increase latency of your bot. I also suggest for you to research HEAD verb, to further reduce your bot's burden on the server and make greater economical use of your link and your processor.
Reaped: Re: scraping flash content
by NodeReaper (Curate) on Feb 10, 2014 at 23:40 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1067536]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2014-09-02 22:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (32 votes), past polls