Beefy Boxes and Bandwidth Generously Provided by pair Networks
Syntactic Confectionery Delight
 
PerlMonks  

How do I perform web automation with sites that use Javascript?

( #761527=categorized question: print w/ replies, xml ) Need Help??
Contributed by whakka on May 02, 2009 at 18:58 UTC
Q&A  > HTTP and FTP clients


Description:

I need to automate a script for crawling web pages, but these pages use Javascript/AJAX for form processing and the like. LWP and WWW::Mechanize don't handle this case well. What can I do?

Answer: How do I perform web automation with sites that use Javascript?
contributed by jettero

JavaScript::SpiderMonkey seems to be in use by scripts built as recently as Net::Plurk::Dumper. Something to add to the list in any case.

Answer: How do I perform web automation with sites that use Javascript?
contributed by jdporter

Here are some modules which give you a way around the issue:

Other things to try:

  • Disable Javascript in your browser and see if the site still functions as you want. If so, then you don't actually have a problem :)
  • Figure out what the scripts are doing on the wire, and re-implement those transactions in your own program. The Firefox add-on Live HTTP Headers is well suited for this.

This info provided by the OP.

Answer: How do I perform web automation with sites that use Javascript?
contributed by planetscape

Don't forget Limbic~Region's excellent Tutorial, Using WWW::Selenium To Test Or Automate An Ajax Website.

Answer: How do I perform web automation with sites that use Javascript?
contributed by ninuzzo

A recent addition of mine is WWW::HtmlUnit::Spidey. This module uses the Java library HtmlUnit which is a headless browser with pretty good JavaScript support. Do not worry, you won't have to write any Java code :D

It is good for massive web scraping where screen scraping does not scale and may be unstable.

There is a tutorial here that scrapes some data obtained from a form not working without JavaScript support.

Btw I am just a Perl beginner. Any Perl guru interested in co-developing Spidey?

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • Outside of code tags, you may need to use entities for some characters:
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others imbibing at the Monastery: (7)
    As of 2014-12-25 16:08 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?

      Is guessing a good strategy for surviving in the IT business?





      Results (160 votes), past polls