Hi there. I'm not clear on what you want exactly. You just want the names of the forms? WWW::Mechanize has a dump_forms method- so you may not need Mechanize::Firefox. Also, If you use the firefox 'firebug' extension, you can inspect any html element and then use the name or value in the script to mechanize what you want to do.. | [reply] |
Ok cool I'll give that a go. My whole idea though is to try to avoid having to use firebug. I've been making scripts by hand for a little while and I wanted to take the time to make something a little more intelligent. I do understand I can only cover relatively simple scenarios, but i want to take more of the leg work out of setting up a scraper.
Cheers all the same.
| [reply] |
I would also consider using the Web::Scraper module. I haven't used it much but it looks pretty good
| [reply] |
Ah ok ! Fair enough.
What I'm trying to do is make a simplified 'web scraper'
Frequently within our Org I need to collect data from various systems ( say checking the names on the internal colleague register )
But we have many systems and some use javascript. Because of this I was leaning towards Mech::Firefox since it handles it for you.
I envisaged putting in the internal web address, then receiving back a set of links, forms, etc which the user could then select. I would save the choices in a config file and so allow the user to avoid repetitive checks.
So i'm bit confused by the answer as it suggest using HTML::Form but the documentation states it doesn't return this type of object.
I'd like to understand it properly as I intend to expand the code substantially
| [reply] |
Hi help_3452, there seems to be several ways going about what you need, which makes it hard for me to tell you what to do. So I will make a few suggestions based on what I understand:
-I would still consider trying to use just the plain WWW::Mechanize for 'collecting the data'.
-I have bypassed java script before with that module. I would also suggest getting the 'Live HTTP Headers' module for firefox. This will help you bypass some of the java script by seeing the HTTP GET/POSTS that may be occurring as you navigate the site.
- I think you're making this harder than it may need to be by allowing the user to select the particulars of the forms. If this feature is a 'must have' to you, I would get the form names, and allow the user to select which forms they intend to 'submit', have them enter the required input and pass that to the submit_form method that mechanize has. Each input could be a 'field' within the 'form', maybe.. Here is an example of what I used to login with Mechanize::Firefox:
$mech->form_name('loginform');
$mech->field('email' => 'me@awesome.com');
$mech->field('password' => 'l337');
$mech->click_button(name => 'login');
The WWW::Mechanize module is basically the same in this regard. For each of those methods I just used firebug to inspect each HTML element and then coded it into the script. | [reply] [d/l] |