Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

crawling past a javascript submit function

by hashED (Novice)
on Sep 08, 2007 at 07:30 UTC ( #637800=perlquestion: print w/replies, xml ) Need Help??

hashED has asked for the wisdom of the Perl Monks concerning the following question:

Dear Masters

I'm building a scraper that at one point encounters a page containing the following JS function:

<script language="JavaScript"> function LinkSubmit(strCatID) { var frm1 = document.contract_cat_index1; frm1.CSCRCat.value=strCatID; frm1.submit(); }
Further down the code, LinkSubmit is referenced in lines similar to the following:

<td align="left" ><a href="javascript:LinkSubmit('1/Janitorial');" TITLE="CLICK to select this category" class="contentLink">Janitorial</a></td>

Does anyone know how I can get my perl robot to follow the link that LinkSubmit puts together? Sorry if the answer is easy, but I've read two and a half books and all the faqs I can find, and still haven't seen the solution. Noobotron status acknowledged.

Replies are listed 'Best First'.
Re: crawling past a javascript submit function
by planetscape (Chancellor) on Sep 08, 2007 at 11:35 UTC
      For anyone who ends up here on the same quest that brought me, the easiest way (by a long shot) I found to figure out what the JS was doing was the Live HTTP Headers plugin for Firefox.

      Use it, and with luck, your search might be just about over.

      Hm... you are very wise. Thanks for the help!
Re: crawling past a javascript submit function
by Skeeve (Parson) on Sep 08, 2007 at 07:47 UTC

    The link followed is the url given in the form-Tag. The strCatID is simply one value supplied in the form.

    Question is: what do you want to achiev? A general robot won't be able to follow those links easily. A specialised is no big deal to write.


    s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
    +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e
      Specialized! Specialized! The form tag is:

      <form name="contract_cat_index1" id="contract_cat_index1" action="/cscr/contract_ads/display/contract_subcat_index.asp?GUID=" method="post">

      When I copy and paste that link, I get "Missing data required to display the requested web page." Of course. Because then I'm not posting. I think I see what to do, and I'm gonna fiddle for a bit... just gotta get over to my other machine.

      I'm still not sure what the parameters for do_POST should be, but I think I can figure it out through trial and error. If you know a better way, please advise.

        You simply have to find all input elements inside the form and supply all the information needed.

        OTOH: There are modules that help you. I think WWW::Mechanize is one of those.


        s$$([},&%#}/&/]+}%&{})*;#$&&s&&$^X.($'^"%]=\&(|?*{%
        +.+=%;.#_}\&"^"-+%*).}%:##%}={~=~:.")&e&&s""`$''`"e

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://637800]
Approved by wfsp
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others wandering the Monastery: (5)
As of 2022-08-13 09:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?