http://www.perlmonks.org?node_id=293160

waiterm has asked for the wisdom of the Perl Monks concerning the following question:

I'm using a simple get($src) command to retrieve html source from the web, but what do I do if Javascript is used to create the HTML source?? It doesn't appear to be picked up when I use the get command. For example http://www.helpfulholidays.com/availability_fullpage.asp?pref=T421&year=2003. You'll notice get($src) doesn't retrieve the table in the middle of the page, which is the section I need to parse. How do I go about retrieving this section so that I can make use of it? I'm using Active Perl 5.8.0 at the moment.
  • Comment on Javascript in Perl and retrieving the HTML source.

Replies are listed 'Best First'.
Re: Javascript in Perl and retrieving the HTML source.
by jeffa (Bishop) on Sep 22, 2003 at 15:55 UTC
    Are you sure? I just used LWP::Simple to retrieve the page and that middle table was indeed there. I disabled JavaScript in my browser and the table was rendered ...

    However, there is a solution to these JavaScript delimas: JavaScript. You can see an example of where i used it at (jeffa) Re: Encrypt web files!.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Javascript in Perl and retrieving the HTML source.
by bear0053 (Hermit) on Sep 22, 2003 at 15:58 UTC
    try this:
    my $SendObject=Win32::OLE->new('microsoft.XMLhttp'); $SendObject->open"GET", "http://www.helpfulholidays.com/availability_f +ullpage.asp?pref=T421&year=2003", "false"); $SendObject->setRequestHeader("Content-type", "text/plain"); $SendObject->send(); my $RegResponse = $SendObject->responseText;
    in $RegResponse the html source will be
Re: Javascript in Perl and retrieving the HTML source.
by bunnyman (Hermit) on Sep 22, 2003 at 16:08 UTC

    The HTTP server is giving you the raw page before Javascript is processed. This is because the web browser has to process it. If you changed your web browser settings to turn off Javascript, you would have the same problem as you are having with the Perl program.

    What you need to do is process the Javascript commands in your Perl program, just like the web browser does. Check out Javascript for a Perl module that does that.

      I'm having real trouble setting up the Javascript module on my computer, and am not really getting anywhere at all. Could you point me in the right direction with this one as I'm still fairly new to perl. Thanks!

        I have not used the Javascript module before, so I am not sure how it works. Check out jeffa's reply at 293164. I agree with jeffa that the table is not being generated with Javascript, so I don't see why you can't just download the page and parse the HTML. The only Javascript code on that page is only there to manipulate the images on the page, and I am not even sure that it does anything at all (it looks like MacroMedia Dreamweaver puts MM_ functions into everything it makes.)

Re: Javascript in Perl and retrieving the HTML source.
by bassplayer (Monsignor) on Sep 22, 2003 at 15:56 UTC
    Please include the code that you are using to retrieve the source code for this page. Including the code you have tried makes it much easier to assist you, and is always a good idea when posting on this site.

    While parts of that table are affected by JavaScript, I do see the HTML for the table when I view the page source.

    bassplayer

      code as follows
      open(HHOL, "C:/tmp/HHOL/HHOL.txt") || die ("unable to open HHOL file") +; while(my @HHOLid = split /=[^\n]*\n/, <HHOL>) { foreach $HHOLid (@HHOLid) { $src = "http://www.helpfulholidays.com/property.asp?ref=".$ +HHOLid."&year=".$year; sleep (rand 10); $_ = get($src); print ("$src\n"); } } }
      When I used this piece of code the table was excluded.
        Besides having an extra curly brace at the end, this code might not do what you had in mind. If there is indeed JavaScript that needs interpreting, then the JavaScript module that jeffa and sgifford suggested should do the trick. However, in the code above, you are printing the URL, not the source code. Are you sure the table in question is not being retrieved? What is the purpose of the file being opened?

        bassplayer

Re: Javascript in Perl and retrieving the HTML source.
by sgifford (Prior) on Sep 22, 2003 at 15:57 UTC

    The JavaScript module may do what you want. I've never used it, and the version is 0.52 so it may be incomplete, but it says it's just a frontend to Mozilla's JavaScript libraries, so maybe it will work..