http://www.perlmonks.org?node_id=1067536

revendar has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks, I'm trying to scrap a flash enabled website(http://concerts.livenation.com/event/04004B76DB4C779A?tm_link=search_msg-0_04004B76DB4C779A&crosssite=TM_US:735415:32837).Hera I'm trying to get the availability of seats from flash screen. I have found http link which the flash app uses using MozRepl in firefox. But a part of xml which I get is encrypted. Is it due to gzip format?? I'm pasting the code here. Please enlighten me. Thanks

#!C:\Perl\bin\perl use LWP::UserAgent; use strict; use Data::Dumper; use Date::Simple qw/ date today /; use XML::SImple; use Compress::Zlib; local $SIG{__WARN__} = sub { }; my $ua = LWP::UserAgent->new; $ua->agent( "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.3) Gecko/201004 +01 Firefox/3.6.3 (FM Scene 4.6.1)" ); my $xs = XML::Simple->new(); my $url = "http://concerts.livenation.com/app/availability/04004B76DB4 +C779A"; my $response = $ua->get($url); if ( $response->is_success ) { print $response->decoded_content; my $ref = $xs->XMLin( $response->decoded_content ); #open FILE, ">f.gz"; # binmode(FILE); # print FILE $ref->{ev_comp}; # or whatever # close FILE; print Compress::Zlib::memGunzip($ref->{ev_comp}); } else { #die $response->status_line; print "\t[FAILURE]\n"; }

Replies are listed 'Best First'.
Re: scraping flash content
by GrandFather (Saint) on Dec 17, 2013 at 23:56 UTC

    That site explicitly excludes what you are trying to do in their Terms and Conditions.

    True laziness is hard work
Re: scraping flash content
by davido (Cardinal) on Dec 18, 2013 at 00:31 UTC

    Furthermore, that company is a Perl shop, and has even hosted Los Angeles Perl Mongers on occasion. If there's anything legitimate you would like to do involving them, get in touch with them and work it out together.


    Dave

Re: scraping flash content
by Anonymous Monk on Dec 18, 2013 at 05:46 UTC
    Ignoring arguing about gun control in a thread where the poster wants to know how to load bullets. Let me continue with civil monastery behavior instead of berating. To decompress a HTTP body of gzip content, I suggest you to try
    $zlib = new Compress::Raw::Zlib::Inflate( -WindowBits => WANT_GZIP, -C +onsumeInput => 1 ); #LATER sub onData { my ($comp_in, $output, $status) = $_[0]; $status = $zlib->inflateReset() if $zlib->status() == Z_STREAM_EN +D; die "failed" if $status != Z_OK; $status = $zlib->inflate($comp_in, $output); die "failed" if $status != Z_OK && $status != Z_STREAM_END ; print $output; }
    Of interest to you, may I suggest using Accept-Encoding, to disable gzip encoding on the HTTP compliant server? Although I must advise, for a bot, as you are writing, it is a poor use of your broadband connection, to transfer uncompressed data over it. It shall reduce the throughput and increase latency of your bot. I also suggest for you to research HEAD verb, to further reduce your bot's burden on the server and make greater economical use of your link and your processor.