Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot
 
PerlMonks  

screen scraping

by garskoci (Novice)
on Jan 08, 2005 at 18:40 UTC ( #420567=perlquestion: print w/ replies, xml ) Need Help??
garskoci has asked for the wisdom of the Perl Monks concerning the following question:

Hello Monks!

This is my first time trying to do some screen scraping. In my real life I use Perl to automate administration tasks and parse data. This is my first experience with OO and screen scraping. What I basically need to do is, log into my Netgear router, navigate to the page that shows the port rules and select/de-select radio buttons to open or close ports. Then submit the form.

I found that I can login with the LWP module and get the html from the correct page showing the port settings. I also found that can not login in when using Mechanize. I am having a bit of trouble getting the form name and the names of the fields.

So..... can you please tell me if I am on track?
I have plans on using LWP to login and get the resulting html that shows the port settings. Use Mechanize to select and de-select radio buttons. Use Mechanize to submit the form. Use LWP to get the html back and Tokeparser to get and display the final settings in a reportish looking type of output.

Does this sound like a good way of going about this? Any advice would be greatly appreciated. I'm somewhat lost here.

Regards.

Edited by davido: Added formatting tags to match original input intent.

Comment on screen scraping
Re: screen scraping
by borisz (Canon) on Jan 08, 2005 at 18:53 UTC
    Both modules can do what you want. It is easier with WWW::Mechanize
    Boris
Re: screen scraping
by Limbic~Region (Chancellor) on Jan 08, 2005 at 19:37 UTC
    garskoci,
    I also found that can not login in when using Mechanize.

    What problem(s) are you having? Since WWW::Mechanize subclasses LWP::UserAgent, all the methods like credentials for doing authorization should work. So if that isn't it - what is?

    Cheers - L~R

      Thanks for the response. I tried this bit here to log in. With different form names.
      $mech->get($url); $mech->form_name( 'FVS318'); $mech->field('id', $USER); $mech->field('p',$PASS); $mech->click('submit');
      But, this works. So, I can log in using the following.
      my $mech = LWP::UserAgent->new; $mech->credentials( '192.168.0.1:80', 'FVS318', 'admin' => 'secret' );
      I can get to the correct page, as I mentioned. So, now I will try to select and de-select the radio buttons. I have one example of selecting the radio buttons. I'll give it a whirl. Again, using the Perl modules and oo is totally new to me. Thank you.
        garskoci,
        Are the user and pass fields form values as the WWW::Mechanize example indicates, or are they more like HTTP basic auth like the LWP::UserAgent shows? I am just guessing, but what does the following code do?
        #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; my $mech = WWW::Mechanize->new( autocheck => 1 ); $mech->credentials( '192.168.0.1:80', 'FVS318', 'admin' => 'secret' ); # And then $mech->get() the same as in the LWP::UserAgent code
        FWIW, take a look at WWW::Mechanize::Shell as well.

        Cheers - L~R

        Update: Added a to autocheck

Re: screen scraping
by Anonymous Monk on Jan 09, 2005 at 03:05 UTC
      Hello again. Well, I got a bit further. I'm actually trying to check a checkbox, not a radiobutton as I previously stated. I can log in, get the html from the page. When I try to check the checkbox, I don't receive any errors, but it's not working either. Here is my script.
      #!/usr/bin/perl use strict; use warnings; use WWW::Mechanize; my $url='http://192.168.0.1/WanService.html'; my $mech = WWW::Mechanize->new( utocheck => 1 ); $mech->credentials( '192.168.0.1:80', 'FVS318', 'admin' => 'secret' ); # And then $mech->get() the same as in the LWP::UserAgent code my $response = $mech->get($url); die "Error at $url\n ", $response->status_line, "\n Aborting" unless $ +response->is_success; print "SUCCESS data type: ", $response->content_type, "\n\n"; #print "\n\n"; #print $response->content; $mech->tick('EnBt3', 'checkbox', 'CHECKED'); $mech->submit(); print "\n\nDone.\n\n";
      Here is the html that I think I should be working with.
      This one appears to be checked.
      <td width="10%" align="center"><input type="checkbox" name="EnBt0" val +ue="checkbox" CHECKED > </td>
      This one appears not to be checked and is the one that I'm trying to check.
      <td width="10%" align="center"><input type="checkbox" name="EnBt3" val +ue="checkbox" ></td>

      Am I doing something wrong?
      Thanks.
      I just wanted to add the html for the "Apply" button.
      <input type="button" value="Apply " onClick="toApply(document.Wanblock +);">&nbsp;&nbsp;

      I know that I'm very close. I tried to use to click the Apply button, but it produces the following error.
      Unknown click_button parameter "Apply" at ./pm_test.pl line 23 Can't call method "header" on an undefined value at /usr/lib/perl5/site_perl/5.8.3/WWW/Mechanize.pm line 1763.
      $mech->click_button('Apply ');
        Anyone?
Re: screen scraping
by xern (Beadle) on May 28, 2006 at 17:27 UTC
    You may try FEAR::API, another choice for site scraping.
Re: screen scraping
by planetscape (Canon) on May 29, 2006 at 11:45 UTC

    I would try using a module such as HTTP::Recorder or WWW::Mechanize::Shell to record a successful manual form submission. The output of HTTP::Recorder, for instance, can be "dropped" right into your WWW::Mechanize scripts.

    Otherwise... post the actual error messages or whatnot that you are getting...

    HTH,

    planetscape

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://420567]
Approved by BrowserUk
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (8)
As of 2014-07-11 07:46 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (220 votes), past polls