Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

Javascript in Perl and retrieving the HTML source.

by waiterm (Acolyte)
on Sep 22, 2003 at 15:43 UTC ( #293160=perlquestion: print w/replies, xml ) Need Help??

waiterm has asked for the wisdom of the Perl Monks concerning the following question:

I'm using a simple get($src) command to retrieve html source from the web, but what do I do if Javascript is used to create the HTML source?? It doesn't appear to be picked up when I use the get command. For example http://www.helpfulholidays.com/availability_fullpage.asp?pref=T421&year=2003. You'll notice get($src) doesn't retrieve the table in the middle of the page, which is the section I need to parse. How do I go about retrieving this section so that I can make use of it? I'm using Active Perl 5.8.0 at the moment.
  • Comment on Javascript in Perl and retrieving the HTML source.

Replies are listed 'Best First'.
Re: Javascript in Perl and retrieving the HTML source.
by jeffa (Bishop) on Sep 22, 2003 at 15:55 UTC
    Are you sure? I just used LWP::Simple to retrieve the page and that middle table was indeed there. I disabled JavaScript in my browser and the table was rendered ...

    However, there is a solution to these JavaScript delimas: JavaScript. You can see an example of where i used it at (jeffa) Re: Encrypt web files!.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
Re: Javascript in Perl and retrieving the HTML source.
by bear0053 (Hermit) on Sep 22, 2003 at 15:58 UTC
    try this:
    my $SendObject=Win32::OLE->new('microsoft.XMLhttp'); $SendObject->open"GET", "http://www.helpfulholidays.com/availability_f +ullpage.asp?pref=T421&year=2003", "false"); $SendObject->setRequestHeader("Content-type", "text/plain"); $SendObject->send(); my $RegResponse = $SendObject->responseText;
    in $RegResponse the html source will be
Re: Javascript in Perl and retrieving the HTML source.
by bunnyman (Hermit) on Sep 22, 2003 at 16:08 UTC

    The HTTP server is giving you the raw page before Javascript is processed. This is because the web browser has to process it. If you changed your web browser settings to turn off Javascript, you would have the same problem as you are having with the Perl program.

    What you need to do is process the Javascript commands in your Perl program, just like the web browser does. Check out Javascript for a Perl module that does that.

      I'm having real trouble setting up the Javascript module on my computer, and am not really getting anywhere at all. Could you point me in the right direction with this one as I'm still fairly new to perl. Thanks!

        I have not used the Javascript module before, so I am not sure how it works. Check out jeffa's reply at 293164. I agree with jeffa that the table is not being generated with Javascript, so I don't see why you can't just download the page and parse the HTML. The only Javascript code on that page is only there to manipulate the images on the page, and I am not even sure that it does anything at all (it looks like MacroMedia Dreamweaver puts MM_ functions into everything it makes.)

Re: Javascript in Perl and retrieving the HTML source.
by bassplayer (Monsignor) on Sep 22, 2003 at 15:56 UTC
    Please include the code that you are using to retrieve the source code for this page. Including the code you have tried makes it much easier to assist you, and is always a good idea when posting on this site.

    While parts of that table are affected by JavaScript, I do see the HTML for the table when I view the page source.

    bassplayer

      code as follows
      open(HHOL, "C:/tmp/HHOL/HHOL.txt") || die ("unable to open HHOL file") +; while(my @HHOLid = split /=[^\n]*\n/, <HHOL>) { foreach $HHOLid (@HHOLid) { $src = "http://www.helpfulholidays.com/property.asp?ref=".$ +HHOLid."&year=".$year; sleep (rand 10); $_ = get($src); print ("$src\n"); } } }
      When I used this piece of code the table was excluded.
        Besides having an extra curly brace at the end, this code might not do what you had in mind. If there is indeed JavaScript that needs interpreting, then the JavaScript module that jeffa and sgifford suggested should do the trick. However, in the code above, you are printing the URL, not the source code. Are you sure the table in question is not being retrieved? What is the purpose of the file being opened?

        bassplayer

Re: Javascript in Perl and retrieving the HTML source.
by sgifford (Prior) on Sep 22, 2003 at 15:57 UTC

    The JavaScript module may do what you want. I've never used it, and the version is 0.52 so it may be incomplete, but it says it's just a frontend to Mozilla's JavaScript libraries, so maybe it will work..

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://293160]
Approved by broquaint
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others contemplating the Monastery: (4)
As of 2022-08-19 07:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?