Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

stuck with WWW::Mechanize drop down list

by abualiga (Scribe)
on Jun 01, 2012 at 19:23 UTC ( #973851=perlquestion: print w/ replies, xml ) Need Help??
abualiga has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I'm a perl beginner and am learning to use the www::mechanize module to get a list of genome sequencing projects from http://www.ncbi.nlm.nih.gov/Traces/wgs/. After I fill in the form, I cannot figure out how to select 'All' from the 'Show projects on page' listbox.

I tried 'find_all_inputs' with 'select' as criteria for a listbox, but nothing gets assigned to the variable. My other issue is not being able to click on the 'Download as TAB delimited list' link to output results to file.

I can complete this task using the browser, but then I don't learn. Below are the few lines of code I have so far. Your advice will be a great help.

many thanks!

#!/usr/local/bin/perl use strict; use warnings; use autodie qw/ open close /; use 5.012; use WWW::Mechanize; # create WWW::Mechanize object # autocheck 1 checks each request to ensure it was successful my $browser = WWW::Mechanize->new( autocheck => [1] ); # retrieve page $browser->get( 'http://www.ncbi.nlm.nih.gov/Traces/wgs/' ); #select form to fill based on mech-dump output $browser->form_number(1); # fill field 'term' with name of species $browser->field( 'term', 'Escherichia' ); # click apply button $browser->submit( 'Apply' ); my $url = $browser->uri; # launch browser to test url #system( 'firefox', $url ); my @inputs = $browser->find_all_inputs( type => 'select' ); say @inputs;

Comment on stuck with WWW::Mechanize drop down list
Download Code
Replies are listed 'Best First'.
Re: stuck with WWW::Mechanize drop down list
by spazm (Monk) on Jun 02, 2012 at 00:29 UTC
    The dropdown selector uses javascript to reload the page. It's dorky:
    <select id="_size" name="size" onchange="var s=sURL + '&size=' + this.value; document.location.href=s"><option value="all">All</option><option value="50">50</option><option value="100">100</option></select>
    
    We can simulate this by adding an "&size=all" to the url. We'll do this by setting an extra field entry:
    $browser->field( 'size', 'all' );
    Example:
    #!env perl use strict; use warnings; use autodie qw/ open close /; use 5.012; use WWW::Mechanize; # create WWW::Mechanize object # autocheck 1 checks each request to ensure it was successful my $browser = WWW::Mechanize->new( autocheck => [1] ); # retrieve page $browser->get('http://www.ncbi.nlm.nih.gov/Traces/wgs/'); #select form to fill based on mech-dump output $browser->form_number(1); # fill field 'term' with name of species $browser->field( 'term', 'Escherichia' ); $browser->field( 'size', 'all' ); # click apply button $browser->submit('Apply'); my $url = $browser->uri; print "url: $url\n"; # launch browser to test url #system( 'firefox', $url ); print $browser->content();
      Now that you have the full list, you'd like to follow the link for the "Download as TAB delimited list". In your browser, following the link will lead to a saved file. In the mech, this will be just more content.

      If you want to be clever, you can get the filename from the LWP's HTTP::Response and use it as a filename to dump the file.

      $browser->follow_link( text_regex => qr/Download as TAB/i ); print $browser->content(); # prints TAB delimited file to STDOUT
      $browser->follow_link( text_regex => qr/Download as TAB/i ); if ( my $filename = $browser->res->filename ) { die "file already exists [$filename]" if -e $filename; print STDERR "Saving downloaded file to [$filename]\n"; open my $fh, ">", $filename; print $fh $browser->content; close $fh; }
      #!env perl use strict; use warnings; use autodie qw/ open close /; use 5.012; use WWW::Mechanize; # create WWW::Mechanize object # autocheck 1 checks each request to ensure it was successful my $browser = WWW::Mechanize->new( autocheck => [1] ); # retrieve page $browser->get('http://www.ncbi.nlm.nih.gov/Traces/wgs/'); #select form to fill based on mech-dump output $browser->form_number(1); # fill field 'term' with name of species $browser->field( 'term', 'Escherichia' ); $browser->field( 'size', 'all' ); # click apply button $browser->submit('Apply'); my $url = $browser->uri; print "url: $url\n"; $browser->follow_link( text_regex => qr/Download as TAB/i ); #print $browser->content(); # prints TAB delimited file to STDOUT if ( my $filename = $browser->res->filename ) { die "file already exists [$filename]" if -e $filename; print STDERR "Saving downloaded file to [$filename]\n"; open my $fh, ">", $filename; print $fh $browser->content; close $fh; }

        Spazm, thanks much, especially for the explanations!

        Question. If mech-dump doesn't output content of drop down lists, do I always need to look at the page source and, if so, then add the selection as a 'field' entry?

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://973851]
Approved by Eliya
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others lurking in the Monastery: (11)
As of 2015-07-31 11:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (276 votes), past polls