Problem with WWW::Mechanize 'select' method

by monkfan (Curate)
on Jul 07, 2007 at 08:13 UTC
monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Most Revered Monks,
I was trying this simple Mech script on this website:
# use strict; use Data::Dumper; use Carp; use WWW::Mechanize; my $address = ''; my $file = "ROX1.fasta"; my $email = ''; my $id = "ROX1 - Scope"; my $species = "S. cerevisiae"; my $mech = WWW::Mechanize->new(); $mech->get($address); $mech->select( 'species', $species); # this doesn't work. $mech->set_fields( #'species' => $species, # this also won't work # But the rest of these fields are recognized 'groupFile' => $file, 'emailAddress' => $email, 'emailSubject' => $id ); $mech->submit; my $result = $mech->content(); print "$result\n";
The error problem I got is this:
Input "species" not found at line 22
The problem I had is that it fails to recognize the fields 'species' as stated in the source code of that website (see below for full source code). Although from the source code it is clear that it contain the 'species' field:
<select name="species" onchange="changeSpecies(this)">
Why Mech failed to recognize that field? How can I resolve this problem?

If you want to try my script above, the dataset of "ROX1.fasta" can be found here.

The full html source code is this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN">
[HTML source code truncated for brevity - contains form with species select field] [HTML continues] [HTML continues] [HTML continues - JavaScript functions for species selection] [HTML continues] [HTML continues] [HTML continues] <select name="species" onchange="changeSpecies(this)">
[options list truncated]
</select> [HTML continues with species options] [HTML continues with more species options] [HTML continues with form fields] [HTML continues] [HTML ends]


Replies are listed 'Best First'.
Re: Problem with WWW::Mechanize 'select' method
by marto (Archbishop) on Jul 07, 2007 at 14:06 UTC
    When I experience such weirdness I turn to Corion's fantastic WWW::Mechanize::Shell:
    >perl -MWWW::Mechanize::Shell -eshell (no url)>get Retrieving>dump POST (multipart/form-d +ata) [paramForm] paramFile=none (hidden readonly) upstream=none (hidden readonly) runBeam=Yes (hidden readonly) runAmbiguizer=Yes (hidden readonly) runBipartites=Yes (hidden readonly) upstreamType=intergenic (radio) [*intergenic/Intergenic|fi +xed/Fixed] group= (textarea) groupFile= (file) emailAddress= (text) emailSubject= (text) selectParamsButton=Run SCOPE (submit)
    Gadzooks, no 'species' input in sight, or any other select option! Comparing the Source html to the values we can see via WWW::Mechanize::Shell I would think that since there are no option tags we can't use them as a valid input. They seem to be using JavaScript to populate the options for the select boxes. <joke>Insert anti JavaScript comment here :P </joke>.

    WWW::Mechanize does not support JavaScript, When using Firefox with NoScript none of the select boxes have any options, I strongly suspect that this is the problem you face. Why not try automating your browser (via Mozilla::Mechanize or Win32::IE::Mechanize or alike) which does understand JavaScript to achieve your goal.

    Hope this helps

    Update: Added the sentence "They seem to be using JavaScript to populate the options for the select boxes." for clarity

Re: Problem with WWW::Mechanize 'select' method
by naikonta (Curate) on Jul 07, 2007 at 12:00 UTC
    I tried to insert some debugging code between the $mech->get() and $mech->select() lines, as following:
    my $f = 0; for my $form ($mech->forms) { printf "Find form #%d: %s\n", ++$f, $form->attr('name'); my $i = 0; for my $input ($form->inputs) { printf "Input #%d: %s (type: %s), values: [%s]\n", ++$i, $input->name, $input->type, join(', ', $input->possible_values); } }
    I also add a debugging line after the $mech->select() line to print the value set for the field ("S. cerevisiae").
    print 'Selected species: ', $mech->value('species'), "\n";

    When I run it, I got this result:

    Find form #1: paramForm Input #1: paramFile (type: hidden), values: [] Input #2: upstream (type: hidden), values: [] Input #3: runBeam (type: hidden), values: [] Input #4: runAmbiguizer (type: hidden), values: [] Input #5: runBipartites (type: hidden), values: [] Input #6: upstreamType (type: radio), values: [intergenic, fixed] Input #7: group (type: textarea), values: [] Input #8: groupFile (type: file), values: [] Input #9: emailAddress (type: text), values: [] Input #10: emailSubject (type: text), values: [] Input #11: selectParamsButton (type: submit), values: [] Input "species" not found at line 31 No such field 'species' at /usr/lib/perl5/site_perl/5.8.8/WWW/Mechaniz line 1324
    Somehow the 'species' field isn't found and an error is issued instead. There are 11 fields. I can't figure out why the 'species' field isn't recognized. I think there's something went along the way to the remote server. So I download the html code and put it on my local web server. What I get is even more suprising, increasing my curiosity level:
    Find form #1: paramForm Input #1: paramFile (type: hidden), values: [] Input #2: upstream (type: hidden), values: [] Input #3: runBeam (type: hidden), values: [] Input #4: runAmbiguizer (type: hidden), values: [] Input #5: runBipartites (type: hidden), values: [] Input #6: species (type: option), values: [init, A. fumigatus Af293, A +. gossypii, A. nidulans, A. thaliana, A. tumefaciens C58, B. subtilis +, C. elegans, C. tetani E88, C. trachomatis, Candida albicans, Candid +a glubrata, Candida lusitaniae, Candida tropicalis, Cryptococcus neof +ormans, D. melanogaster, D. rerio, E. coli K12-MG1655, Fusarium verti +cillioides, H. influenzae, H. pylori, H. sapiens, Histoplasma capsula +tum, Kluyveromyces lactis, M. jannaschii, M. tuberculosis CDC1551, M. + tuberculosis H37Rv, Magnaporthe grisea, N. crassa, Neurospora crassa +#2, P. aeruginosa PA01, P. falciparum 3D7, R. norvegicus, Rhizopus or +yzae, S. aureus MW2, S. cerevisiae, S. pombe, S. typhimurium LT2 SGSC +1412, V. cholerae El Tor N16961, Y. pestis CO92, Yarrowia lipolytica] Input #7: upstreamType (type: radio), values: [intergenic, fixed] Input #8: group (type: textarea), values: [] Input #9: groupFile (type: file), values: [] Input #10: emailAddress (type: text), values: [] Input #11: emailSubject (type: text), values: [] Input #12: selectParamsButton (type: submit), values: [] Selected species: S. cerevisiae

    First, there are 12 fields. Second, the 'species' field is there at #6 (as type 'option') along with the possible values for the select. Third, the selected option is confirmed, and it's "S. cerevisiae".

    Frankly, I haven't find anything why the difference. I'm not even sure whether this has something to do with the internal of WWW::Mechanize although I tend to think "no".

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

      <select name="species" onchange="changeSpecies(this)"> changeSpecies(this)-- is your problem ,this function changing the input, you may need to check the webpage source or the wenb admin.good luck.

node history
Node Type: perlquestion
Approved by Corion
Front-paged by neversaint
As of 2018-04-19 16:38 GMT
