Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

screen scraping

by garskoci (Novice)
on Jan 08, 2005 at 18:40 UTC ( #420567=perlquestion: print w/ replies, xml ) Need Help??
garskoci has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!

This is my first time trying to do some screen scraping. In my real life I use Perl to automate administration tasks and parse data. This is my first experience with OO and screen scraping. What I basically need to do is, log into my Netgear router, navigate to the page that shows the port rules and select/de-select radio buttons to open or close ports. Then submit the form.

I found that I can login with the LWP module and get the html from the correct page showing the port settings. I also found that can not login in when using Mechanize. I am having a bit of trouble getting the form name and the names of the fields.

So..... can you please tell me if I am on track?
I have plans on using LWP to login and get the resulting html that shows the port settings. Use Mechanize to select and de-select radio buttons. Use Mechanize to submit the form. Use LWP to get the html back and Tokeparser to get and display the final settings in a reportish looking type of output.

Does this sound like a good way of going about this? Any advice would be greatly appreciated. I'm somewhat lost here.

Regards.

Edited by davido: Added formatting tags to match original input intent.

Comment on screen scraping
Re: screen scraping
by borisz (Canon) on Jan 08, 2005 at 18:53 UTC
    Both modules can do what you want. It is easier with WWW::Mechanize
    Boris
Re: screen scraping
by Limbic~Region (Chancellor) on Jan 08, 2005 at 19:37 UTC
    garskoci,
    I also found that can not login in when using Mechanize.

    What problem(s) are you having? Since WWW::Mechanize subclasses LWP::UserAgent, all the methods like credentials for doing authorization should work. So if that isn't it - what is?

    Cheers - L~R

      Thanks for the response. I tried this bit here to log in. With different form names.
      $mech->get($url); $mech->form_name( 'FVS318'); $mech->field('id', $USER); $mech->field('p',$PASS); $mech->click('submit');
      But, this works. So, I can log in using the following.
      my $mech = LWP::UserAgent->new; $mech->credentials( '192.168.0.1:80', 'FVS318', 'admin' => 'secret' );
      I can get to the correct page, as I mentioned. So, now I will try to select and de-select the radio buttons. I have one example of selecting the radio buttons. I'll give it a whirl. Again, using the Perl modules and oo is totally new to me. Thank you.
        garskoci,
        Are the user and pass fields form values as the WWW::Mechanize example indicates, or are they more like HTTP basic auth like the LWP::UserAgent shows? I am just guessing, but what does the following code do?
        #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->credentials( '192.168.0.1:80', 'FVS318', 'admin' => 'secret' ); # And then $mech->get() the same as in the LWP::UserAgent code
        FWIW, take a look at WWW::Mechanize::Shell as well.

        Cheers - L~R

        Update: Added a to autocheck

Re: screen scraping
by Anonymous Monk on Jan 09, 2005 at 03:05 UTC
      Hello again. Well, I got a bit further. I'm actually trying to check a checkbox, not a radiobutton as I previously stated. I can log in, get the html from the page. When I try to check the checkbox, I don't receive any errors, but it's not working either. Here is my script.
      #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; my $url='http://192.168.0.1/WanService.html'; my $mech = WWW::Mechanize->new( utocheck => 1 ); $mech->credentials( '192.168.0.1:80', 'FVS318', 'admin' => 'secret' ); # And then $mech->get() the same as in the LWP::UserAgent code my $response = $mech->get($url); die "Error at $url\n ", $response->status_line, "\n Aborting" unless $ +response->is_success; print "SUCCESS data type: ", $response->content_type, "\n\n"; #print "\n\n"; #print $response->content; $mech->tick('EnBt3', 'checkbox', 'CHECKED'); $mech->submit(); print "\n\nDone.\n\n";
      Here is the html that I think I should be working with.
      This one appears to be checked.
      <td width="10%" align="center"><input type="checkbox" name="EnBt0" val +ue="checkbox" CHECKED > </td>
      This one appears not to be checked and is the one that I'm trying to check.
      <td width="10%" align="center"><input type="checkbox" name="EnBt3" val +ue="checkbox" ></td>

      Am I doing something wrong?
      Thanks.
      I just wanted to add the html for the "Apply" button.
      <input type="button" value="Apply " onClick="toApply(document.Wanblock +);">&nbsp;&nbsp;

      I know that I'm very close. I tried to use to click the Apply button, but it produces the following error.
      Unknown click_button parameter "Apply" at ./pm_test.pl line 23 Can't call method "header" on an undefined value at /usr/lib/perl5/site_perl/5.8.3/WWW/Mechanize.pm line 1763.
      $mech->click_button('Apply ');
        Anyone?
Re: screen scraping
by xern (Beadle) on May 28, 2006 at 17:27 UTC
    You may try FEAR::API, another choice for site scraping.
Re: screen scraping
by planetscape (Canon) on May 29, 2006 at 11:45 UTC

    I would try using a module such as HTTP::Recorder or WWW::Mechanize::Shell to record a successful manual form submission. The output of HTTP::Recorder, for instance, can be "dropped" right into your WWW::Mechanize scripts.

    Otherwise... post the actual error messages or whatnot that you are getting...

    HTH,

    planetscape

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://420567]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (5)
As of 2014-11-28 03:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My preferred Perl binaries come from:














    Results (192 votes), past polls