Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw

Problem with WWW::Mechanize 'select' method

by monkfan (Curate)
on Jul 07, 2007 at 08:13 UTC ( #625395=perlquestion: print w/replies, xml ) Need Help??
monkfan has asked for the wisdom of the Perl Monks concerning the following question:

Most Revered Monks,
I was trying this simple Mech script on this website:
# use strict; use Data::Dumper; use Carp; use WWW::Mechanize; my $address = ''; my $file = "ROX1.fasta"; my $email = ''; my $id = "ROX1 - Scope"; my $species = "S. cerevisiae"; my $mech = WWW::Mechanize->new(); $mech->get($address); $mech->select( 'species', $species); # this doesn't work. $mech->set_fields( #'species' => $species, # this also won't work # But the rest of these fields are recognized 'groupFile' => $file, 'emailAddress' => $email, 'emailSubject' => $id ); $mech->submit; my $result = $mech->content(); print "$result\n";
The error problem I got is this:
Input "species" not found at line 22
The problem I had is that it fails to recognize the fields 'species' as stated in the source code of that website (see below for full source code). Although from the source code it is clear that it contain the 'species' field:
<select name="species" onchange="changeSpecies(this)">
Why Mech failed to recognize that field? How can I resolve this problem?

If you want to try my script above, the dataset of "ROX1.fasta" can be found here.

The full html source code is this:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" " +R/html4/strict.dtd"> <html><head> <link rel="SHORTCUT ICON" href=" +cope-icon.png"> <link rel="stylesheet" type="text/css" href="SCOPE_files/scope.css +"> <meta http-equiv="CONTENT-TYPE" content="text/html;charset=utf-8"> +<title>SCOPE</title> <script type="text/javascript" src="SCOPE_files/popup.js"></script> <script type="text/javascript" src="SCOPE_files/species.js"></script> <script type="text/javascript" src="SCOPE_files/validate.js"></script> <script defer="defer" type="text/javascript"> <!-- Begin String.prototype.trim = function() { return this.replace(/^\s+|\s+$/g,""); } String.prototype.ltrim = function() { return this.replace(/^\s+/,""); } String.prototype.rtrim = function() { return this.replace(/\s+$/,""); } function isValidGroupText(str) { var group = str.trim(); if (group.charAt(0) != '>') return true; var lines = group.split(/\n/); var re = /^[acgtrywsmkhbvdn]+$/i; var lastwascomment = false; for (i = 0; i < lines.length; ++i) { if (lines[i] == '') { return false; } else if (!lastwascomment && lines[i].charAt(0) == '>') { lastwascomment = true; var c = lines[i].charAt(1); if (c == ' ' || c == '\t') return false; } else if (re.exec(lines[i]) == null) { return false; } else { lastwascomment = false; } } return !lastwascomment; } function validateAndSubmitForm(theForm) { if(theForm.species.options[theForm.species.selectedIndex].text == +'Select Species') { alert("Please specify a species."); theForm.species.focus(); return false; } else if( trim( == 0 && trim(theForm.gro +upFile.value).length == 0 ){ alert("Please provide a group of genes or a FASTA data.");; return false; } else if ( trim( != 0 && trim( +oupFile.value).length != 0) { alert("Please provide either textual data or input file but no +t both.");; return false; } else if (trim( != 0 && !isValidGroupTex +t({ alert("Invalid FASTA text.");; return false; } else if ((trim(theForm.emailAddress.value).length != 0) && (!isVal +idEmail(theForm.emailAddress.value))) { alert("Invalid email address."); theForm.emailAddress.focus(); return false; } return true; } function enableFixed(enable) { var element = document.getElementById ? document.getElementById("Fixed") : document.all.Fixed; document.paramForm.upstreamType[1].disabled = !enable; document.paramForm.upstreamLength.disabled = !enable || !document. +paramForm.upstreamType[1].checked; element.className = enable ? "" : "disabled"; } function enableIntergenic(enable) { var element = document.getElementById ? document.getElementById("Intergenic") : document.all.Intergen +ic; document.paramForm.upstreamType[0].disabled = !enable; element.className = enable ? "" : "disabled"; } function registerUpstreamChange() { var selectList = document.paramForm.species; var selected = selectList.options[selectList.selectedIndex].text; var thisSpecies = mySpecies[selected]; if(selected == 'Select Species') { document.paramForm.paramFile.value = 'none'; document.paramForm.upstream.value = 'none'; return; } if (document.paramForm.upstreamType[0].checked == true) { document.paramForm.paramFile.value = thisSpecies.intergenicPar +amFile; document.paramForm.upstream.value = 'intergenic'; document.paramForm.upstreamLength.disabled = true; } else if (document.paramForm.upstreamType[1].checked == true) { document.paramForm.paramFile.value = thisSpecies.fixedParamFil +e; document.paramForm.upstream.value = document.paramForm.upstrea +mLength.value; document.paramForm.upstreamLength.disabled = false; } } // this function is called when the species select list has a change e +vent. It updates the paramFile hidden form element // to reflect the param file of the new species and updates the upstre +amLength select element to reflect the available // lengths specified for the selected species. function changeSpecies(selectList) { var selected = selectList.options[selectList.selectedIndex].text; var newLengths = new Array(0); var commentElement = document.getElementById ? document.getElementById("genomeComment") : document.all.genom +eComment; if(selected == 'Select Species'){ document.paramForm.paramFile.value = 'none'; document.paramForm.upstream.value = 'none'; commentElement.innerHTML = ''; enableFixed(false); enableIntergenic(false); }else{ var thisSpecies = mySpecies[selected]; if (thisSpecies.comment == null) { commentElement.innerHTML = ''; } else { commentElement.innerHTML = "Comment: " + thisSpecies.comme +nt; } if (thisSpecies.intergenicParamFile != null) { enableIntergenic(true); document.paramForm.upstreamType[0].disabled = false; if (thisSpecies.fixedParamFile == null) { enableFixed(false); document.paramForm.upstreamType[0].checked = true; } } if (thisSpecies.fixedParamFile != null) { enableFixed(true); if (thisSpecies.intergenicParamFile == null) { enableIntergenic(false); document.paramForm.upstreamType[1].checked = true; } newLengths = thisSpecies.lengths.split(','); } } //update the gene length select list var selectOptions = document.paramForm.upstreamLength.options; // delete all the old values in the upstreamLength options list. while(selectOptions.length > 0){ selectOptions[0] = null; } // now add new values to the options list. for(var i=0; i<newLengths.length; i++){ selectOptions[i] = new Option(newLengths[i], newLengths[i]); } registerUpstreamChange(); } function initSpeciesSelect() { var speciesOptions = document.paramForm.species.options; speciesOptions[0] = new Option('Select Species', 'init'); speciesOptions[0].isSelected = true; var theNames = new Array(0); var i=0; for(var org in mySpecies) theNames[i++] = mySpecies[org].name; theNames.sort(); var i=1; for(var j in theNames){ //var name = mySpecies[org].name; var name = theNames[j]; speciesOptions[i++] = new Option(name, name); } changeSpecies(document.paramForm.species); } //--> </script></head><body dir="ltr" onload="initSpeciesSelect()" lang="en- +US"> <p> </p><table align="center" border="0" cellpadding="0" cellspacing=" +0" width="700"> <tbody><tr> <td rowspan="2" align="left" valign="top" width="100"><img + src="SCOPE_files/scope-big2r.jpg" alt="Logo"></td> <td colspan="4" align="center" valign="top" width="485"><a + href=""><img src="SCOPE_files/scope +-text3.png" alt="SCOPE" border="0"></a></td> <td rowspan="2" align="right" valign="top" width="100"><a +href=""><img src="SCOPE_files/dartmouth.gif" + alt="Dartmouth College" border="0"></a></td> </tr> <tr> <td class="headerlink" align="center" width="125"> <a target="_extrahelp" href="http://genie.dartmouth.ed +u/scope/about.php">About SCOPE...</a> </td> <td class="headerlink" align="center" width="125"><a target="_ +extrahelp" href="">Glossary< +/a></td> <td class="headerlink" align="center" width="125"> <a target="_extrahelp" href="http://genie.dartmouth.ed +u/scope/faq.php">FAQs</a> </td> <td class="headerlink" align="center" width="125"> <a target="_extrahelp" href="http://genie.dartmouth.ed +u/scope/publications.php">Publications</a> </td> </tr> </tbody></table> <form name="paramForm" action="startscope.php" method="post" enctype=" +multipart/form-data" onsubmit="return validateAndSubmitForm(this)"> <input name="paramFile" value="none" type="hidden"> <input name="upstream" value="none" type="hidden"> <p> </p><table id="scopemaintable" class="brdr" style="page-break-befo +re: always;" align="center" border="0" cellpadding="3" cellspacing="0 +" width="700"> <col width="220"> <col> <tbody><tr> <td colspan="2" class="inverse"> Welcome to SCOPE (<span class="acr">S</span>uite for <span class="acr">C</span>omputational identification <span class="acr">O</span>f <span class="acr">P</span>romoter <span class="acr">E</span>lements), an ensemble of pro +grams aimed at identifying novel <i>cis</i>-regulatory elements from +groups of upstream sequences.<input name="runBeam" value="Yes" checke +d="checked" type="hidden"><input name="runAmbiguizer" value="Yes" che +cked="checked" type="hidden"><input name="runBipartites" value="Yes" +checked="checked" type="hidden"></td> </tr> <tr> <td colspan="2" class="brdr_b">Species: <select name="species" onchange="changeSpecies(this)"> <option value="init">Select Species</option><option va +lue="A. fumigatus Af293">A. fumigatus Af293</option><option value="A. + gossypii">A. gossypii</option><option value="A. nidulans">A. nidulan +s</option><option value="A. thaliana">A. thaliana</option><option val +ue="A. tumefaciens C58">A. tumefaciens C58</option><option value="B. +subtilis">B. subtilis</option><option value="C. elegans">C. elegans</ +option><option value="C. tetani E88">C. tetani E88</option><option va +lue="C. trachomatis">C. trachomatis</option><option value="Candida al +bicans">Candida albicans</option><option value="Candida glubrata">Can +dida glubrata</option><option value="Candida lusitaniae">Candida lusi +taniae</option><option value="Candida tropicalis">Candida tropicalis< +/option><option value="Cryptococcus neoformans">Cryptococcus neoforma +ns</option><option value="D. melanogaster">D. melanogaster</option><o +ption value="D. rerio">D. rerio</option><option value="E. coli K12-MG +1655">E. coli K12-MG1655</option><option value="Fusarium verticillioi +des">Fusarium verticillioides</option><option value="H. influenzae">H +. influenzae</option><option value="H. pylori">H. pylori</option><opt +ion value="H. sapiens">H. sapiens</option><option value="Histoplasma +capsulatum">Histoplasma capsulatum</option><option value="Kluyveromyc +es lactis">Kluyveromyces lactis</option><option value="M. jannaschii" +>M. jannaschii</option><option value="M. tuberculosis CDC1551">M. tub +erculosis CDC1551</option><option value="M. tuberculosis H37Rv">M. tu +berculosis H37Rv</option><option value="Magnaporthe grisea">Magnaport +he grisea</option><option value="N. crassa">N. crassa</option><option + value="Neurospora crassa#2">Neurospora crassa#2</option><option valu +e="P. aeruginosa PA01">P. aeruginosa PA01</option><option value="P. f +alciparum 3D7">P. falciparum 3D7</option><option value="R. norvegicus +">R. norvegicus</option><option value="Rhizopus oryzae">Rhizopus oryz +ae</option><option value="S. aureus MW2">S. aureus MW2</option><optio +n value="S. cerevisiae">S. cerevisiae</option><option value="S. pombe +">S. pombe</option><option value="S. typhimurium LT2 SGSC1412">S. typ +himurium LT2 SGSC1412</option><option value="V. cholerae El Tor N1696 +1">V. cholerae El Tor N16961</option><option value="Y. pestis CO92">Y +. pestis CO92</option><option value="Yarrowia lipolytica">Yarrowia li +polytica</option></select> Upstream sequence: <input disabled="disabled" name="up +streamType" value="intergenic" onclick="registerUpstreamChange()" che +cked="checked" type="radio"><span class="disabled" id="Intergenic">In +tergenic</span><input disabled="disabled" name="upstreamType" value=" +fixed" onclick="registerUpstreamChange()" type="radio"><span class="d +isabled" id="Fixed">Fixed</span> <select disabled="disabled" name="upstreamLength" onch +ange="registerUpstreamChange();"> </select> &nbsp;&nbsp;<a href="http://genie.dartmouth. +edu/scope/aboutspecies.php" onclick="popup('aboutspecies.php', 'Blurb', 300, 350); return false">Help</a><div id="gen +omeComment"></div></td> </tr> <tr> <td class="brdr_r" valign="top" width="220"> Enter gene list or FASTA DNA sequences: <a href=" +hp" onclick="popup('aboutgenes.php', 'Blurb', 300, 350); return false">Help</a> </td> <td rowspan="3" class="brdr_b"> <textarea name="group" rows="8" cols="60"></textarea>< +br> <input name="groupFile" size="50" type="file"> </td> </tr> <tr> <td class="brdr_r" align="center" valign="center"> <p>- <i>OR</i> -</p> </td> </tr> <tr> <td class="brdr_br" valign="bottom"> Upload file with that info: </td> </tr> <tr> <td colspan="2" align="left"> <i>If you would like the results (also) returned to yo +u by email, please fill in the fields below.</i> </td> </tr> <tr> <td align="right" width="220">Email address:</td> <td> <input style="background-color: rgb(255, 255, 160);" n +ame="emailAddress" size="59" type="text"> </td> </tr> <tr> <td class="brdr_b" align="right" width="220">Email subject +:</td> <td class="brdr_b"> <input style="background-color: rgb(255, 255, 160);" n +ame="emailSubject" size="59" type="text"> </td> </tr> <tr> <td colspan="2" align="center"> <input name="selectParamsButton" value="Run SCOPE" typ +e="submit"> </td> </tr> </tbody></table> </form> <p class="footer" align="center">Copyright 2004-2007, all rights res +erved, Trustees of Dartmouth College<br> Developed in the lab of Prof. Robert H. Gross under a grant fr +om the National Science Foundation (USA)</p> <p class="footer" align="center">Comments or suggestions? <script src="SCOPE_files/urchin.js" type="text/javascript"> </script> <script type="text/javascript"> _uacct = "UA-843040-2"; urchinTracker(); </script><a class="footer" href="">Ema +il</a> us.</p> </body></html>


Replies are listed 'Best First'.
Re: Problem with WWW::Mechanize 'select' method
by marto (Archbishop) on Jul 07, 2007 at 14:06 UTC
    When I experience such weirdness I turn to Corion's fantastic WWW::Mechanize::Shell:
    >perl -MWWW::Mechanize::Shell -eshell (no url)>get Retrieving>dump POST (multipart/form-d +ata) [paramForm] paramFile=none (hidden readonly) upstream=none (hidden readonly) runBeam=Yes (hidden readonly) runAmbiguizer=Yes (hidden readonly) runBipartites=Yes (hidden readonly) upstreamType=intergenic (radio) [*intergenic/Intergenic|fi +xed/Fixed] group= (textarea) groupFile= (file) emailAddress= (text) emailSubject= (text) selectParamsButton=Run SCOPE (submit)
    Gadzooks, no 'species' input in sight, or any other select option! Comparing the Source html to the values we can see via WWW::Mechanize::Shell I would think that since there are no option tags we can't use them as a valid input. They seem to be using JavaScript to populate the options for the select boxes. <joke>Insert anti JavaScript comment here :P </joke>.

    WWW::Mechanize does not support JavaScript, When using Firefox with NoScript none of the select boxes have any options, I strongly suspect that this is the problem you face. Why not try automating your browser (via Mozilla::Mechanize or Win32::IE::Mechanize or alike) which does understand JavaScript to achieve your goal.

    Hope this helps

    Update: Added the sentence "They seem to be using JavaScript to populate the options for the select boxes." for clarity

Re: Problem with WWW::Mechanize 'select' method
by naikonta (Curate) on Jul 07, 2007 at 12:00 UTC
    I tried to insert some debugging code between the $mech->get() and $mech->select() lines, as following:
    my $f = 0; for my $form ($mech->forms) { printf "Find form #%d: %s\n", ++$f, $form->attr('name'); my $i = 0; for my $input ($form->inputs) { printf "Input #%d: %s (type: %s), values: [%s]\n", ++$i, $input->name, $input->type, join(', ', $input->possible_values); } }
    I also add a debugging line after the $mech->select() line to print the value set for the field ("S. cerevisiae").
    print 'Selected species: ', $mech->value('species'), "\n";

    When I run it, I got this result:

    Find form #1: paramForm Input #1: paramFile (type: hidden), values: [] Input #2: upstream (type: hidden), values: [] Input #3: runBeam (type: hidden), values: [] Input #4: runAmbiguizer (type: hidden), values: [] Input #5: runBipartites (type: hidden), values: [] Input #6: upstreamType (type: radio), values: [intergenic, fixed] Input #7: group (type: textarea), values: [] Input #8: groupFile (type: file), values: [] Input #9: emailAddress (type: text), values: [] Input #10: emailSubject (type: text), values: [] Input #11: selectParamsButton (type: submit), values: [] Input "species" not found at line 31 No such field 'species' at /usr/lib/perl5/site_perl/5.8.8/WWW/Mechaniz line 1324
    Somehow the 'species' field isn't found and an error is issued instead. There are 11 fields. I can't figure out why the 'species' field isn't recognized. I think there's something went along the way to the remote server. So I download the html code and put it on my local web server. What I get is even more suprising, increasing my curiosity level:
    Find form #1: paramForm Input #1: paramFile (type: hidden), values: [] Input #2: upstream (type: hidden), values: [] Input #3: runBeam (type: hidden), values: [] Input #4: runAmbiguizer (type: hidden), values: [] Input #5: runBipartites (type: hidden), values: [] Input #6: species (type: option), values: [init, A. fumigatus Af293, A +. gossypii, A. nidulans, A. thaliana, A. tumefaciens C58, B. subtilis +, C. elegans, C. tetani E88, C. trachomatis, Candida albicans, Candid +a glubrata, Candida lusitaniae, Candida tropicalis, Cryptococcus neof +ormans, D. melanogaster, D. rerio, E. coli K12-MG1655, Fusarium verti +cillioides, H. influenzae, H. pylori, H. sapiens, Histoplasma capsula +tum, Kluyveromyces lactis, M. jannaschii, M. tuberculosis CDC1551, M. + tuberculosis H37Rv, Magnaporthe grisea, N. crassa, Neurospora crassa +#2, P. aeruginosa PA01, P. falciparum 3D7, R. norvegicus, Rhizopus or +yzae, S. aureus MW2, S. cerevisiae, S. pombe, S. typhimurium LT2 SGSC +1412, V. cholerae El Tor N16961, Y. pestis CO92, Yarrowia lipolytica] Input #7: upstreamType (type: radio), values: [intergenic, fixed] Input #8: group (type: textarea), values: [] Input #9: groupFile (type: file), values: [] Input #10: emailAddress (type: text), values: [] Input #11: emailSubject (type: text), values: [] Input #12: selectParamsButton (type: submit), values: [] Selected species: S. cerevisiae

    First, there are 12 fields. Second, the 'species' field is there at #6 (as type 'option') along with the possible values for the select. Third, the selected option is confirmed, and it's "S. cerevisiae".

    Frankly, I haven't find anything why the difference. I'm not even sure whether this has something to do with the internal of WWW::Mechanize although I tend to think "no".

    Open source softwares? Share and enjoy. Make profit from them if you can. Yet, share and enjoy!

      <select name="species" onchange="changeSpecies(this)"> changeSpecies(this)-- is your problem ,this function changing the input, you may need to check the webpage source or the wenb admin.good luck.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://625395]
Approved by Corion
Front-paged by neversaint
and all is quiet...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (3)
As of 2018-04-19 16:38 GMT
Find Nodes?
    Voting Booth?