Perl: the Markov chain saw | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
Hi monks,
I'm trying to scrape a large website. There are two single select drop down lists that refresh the page and populate a third single select drop down list. After selecting from this list, you click on one of 8 links below. The URL for this in the tag in the page HTML is "#", and it says onClick="tohtm('../.*.php'). In Firefox this opens up a new page/tab and brings you to a data table whose contents I need. I'm using WWW::Mechanize for this. I can log in through the first page of this site, and follow a link to get to the page described above. Then I've tried selecting the two first single selects (after selecting the form they're inside by name), but that doesn't seem to work. The responses I get still have an unpopulated last drop down select control. Luckily changing those first two selects brings you to a different URL. So I've also tried just $browser->get()ting that URL and then trying to select and submit/click from the 3rd drop down select menu. Then I've tried following the link to the data through the follow_link function, but this just brings me back to the same page, with a "#" tacked onto the URL given. I've also tried just getting the URL for the data page directly after selecting from the third drop down menu, but that gives me a page with an empty data table that isn't empty when accessed properly through the browser. Below are some snippets from the HTML of the page I'm working with and the key lines from the code I'm trying to get to work.
<form name="sipp" method="post" target="_blank"> And the links to the data table I need look like this. Note the dots in the HTML tags are just so this shows up looking right here:
<..tr> Finally, here's some of my code. This comes after I've already logged in and followed a link the page where the HTML above comes from. Then I've tried both of the following.
P1 just brings me back to the same page I started on. 2 gets me to the data table page, but with an (incorrectly) empty table. Am I just being a newbie web programmer idiot? Is this a Javascript problem? Are these select controls and links all calling javascript functions, which aren't interpreted in Mechanize? Are there other libraries that would scrape this page successfully? I've also tried the Python version of Mechanize, but had no success there either. In reply to Mechanize, Forms, Links, problem from Javascript? by goodepic
|
|