Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
Server Error (Error ID 2299252b94031)

An error has occurred. The site administrators have been notified of the problem and will likely soon fix it. We thank you, for you're patients.

on May 02, 2009 at 18:58 UTC ( #761527=categorized question: print w/replies, xml ) Need Help??
Contributed by whakka on May 02, 2009 at 18:58 UTC
Q&A  > HTTP and FTP clients


I need to automate a script for crawling web pages, but these pages use Javascript/AJAX for form processing and the like. LWP and WWW::Mechanize don't handle this case well. What can I do?

Answer: How do I perform web automation with sites that use Javascript?
contributed by jettero

JavaScript::SpiderMonkey seems to be in use by scripts built as recently as Net::Plurk::Dumper. Something to add to the list in any case.

Answer: How do I perform web automation with sites that use Javascript?
contributed by jdporter

Here are some modules which give you a way around the issue:

Other things to try:

  • Disable Javascript in your browser and see if the site still functions as you want. If so, then you don't actually have a problem :)
  • Figure out what the scripts are doing on the wire, and re-implement those transactions in your own program. The Firefox add-on Live HTTP Headers is well suited for this.

This info provided by the OP.

Answer: How do I perform web automation with sites that use Javascript?
contributed by planetscape

Don't forget Limbic~Region's excellent Tutorial, Using WWW::Selenium To Test Or Automate An Ajax Website.

Answer: How do I perform web automation with sites that use Javascript?
contributed by ninuzzo

A recent addition of mine is WWW::HtmlUnit::Spidey. This module uses the Java library HtmlUnit which is a headless browser with pretty good JavaScript support. Do not worry, you won't have to write any Java code :D

It is good for massive web scraping where screen scraping does not scale and may be unstable.

There is a tutorial here that scrapes some data obtained from a form not working without JavaScript support.

Btw I am just a Perl beginner. Any Perl guru interested in co-developing Spidey?

Please (register and) log in if you wish to add an answer

  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?

    What's my password?
    Create A New User
    and the web crawler heard nothing...

    How do I use this? | Other CB clients
    Other Users?
    Others making s'mores by the fire in the courtyard of the Monastery: (10)
    As of 2016-09-30 12:29 GMT
    Find Nodes?
      Voting Booth?
      Extraterrestrials haven't visited the Earth yet because:

      Results (566 votes). Check out past polls.