Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

How do I perform web automation with sites that use Javascript?

by whakka (Hermit)
on May 02, 2009 at 18:58 UTC ( #761527=categorized question: print w/replies, xml ) Need Help??
Contributed by whakka on May 02, 2009 at 18:58 UTC
Q&A  > HTTP and FTP clients


Description:

I need to automate a script for crawling web pages, but these pages use Javascript/AJAX for form processing and the like. LWP and WWW::Mechanize don't handle this case well. What can I do?

Answer: How do I perform web automation with sites that use Javascript?
contributed by jettero

JavaScript::SpiderMonkey seems to be in use by scripts built as recently as Net::Plurk::Dumper. Something to add to the list in any case.

Answer: How do I perform web automation with sites that use Javascript?
contributed by jdporter

Here are some modules which give you a way around the issue:

Other things to try:

  • Disable Javascript in your browser and see if the site still functions as you want. If so, then you don't actually have a problem :)
  • Figure out what the scripts are doing on the wire, and re-implement those transactions in your own program. The Firefox add-on Live HTTP Headers is well suited for this.

This info provided by the OP.

Answer: How do I perform web automation with sites that use Javascript?
contributed by planetscape

Don't forget Limbic~Region's excellent Tutorial, Using WWW::Selenium To Test Or Automate An Ajax Website.

Answer: How do I perform web automation with sites that use Javascript?
contributed by ninuzzo

A recent addition of mine is WWW::HtmlUnit::Spidey. This module uses the Java library HtmlUnit which is a headless browser with pretty good JavaScript support. Do not worry, you won't have to write any Java code :D

It is good for massive web scraping where screen scraping does not scale and may be unstable.

There is a tutorial here that scrapes some data obtained from a form not working without JavaScript support.

Btw I am just a Perl beginner. Any Perl guru interested in co-developing Spidey?

Please (register and) log in if you wish to add an answer



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.
  • Log In?
    Username:
    Password:

    What's my password?
    Create A New User
    Chatterbox?
    [james28909]: omg
    [james28909]: im just finding out that july and august have 31 days?
    [atcroft]: stevieb: Apparently it was contagious....
    [atcroft]: james28909: Make both hands into fists, place them together (with thumbs concealed), and every knuckle is a month (starting with Jan.) has 31 days, every dip (between knuckles) does not....
    [atcroft]: (And Feb. is the odd case, because it is 28, unless it is a year divisible by 4, or if it is divisible by both 100 and 400 (at which point it is 29).)
    [james28909]: i know but scroll through your calendar on your computer.
    [james28909]: i thiught you were going to say make both hands into a fist and puch yourself in the face
    [atcroft]: .oO(Sorry, I probably should have said take two normal hands....)
    [atcroft]: james28909: No, unless you are a politician, I wouldn't say that (and even if you are, I still probably wouldn't say that).
    [james28909]: i mean how hard can it be? its just subtracting days lol

    How do I use this? | Other CB clients
    Other Users?
    Others imbibing at the Monastery: (3)
    As of 2017-04-29 04:27 GMT
    Sections?
    Information?
    Find Nodes?
    Leftovers?
      Voting Booth?
      I'm a fool:











      Results (531 votes). Check out past polls.