Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

stuck with WWW::Mechanize drop down list

by abualiga (Scribe)
on Jun 01, 2012 at 19:23 UTC ( [id://973851]=perlquestion: print w/replies, xml ) Need Help??

abualiga has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I'm a perl beginner and am learning to use the www::mechanize module to get a list of genome sequencing projects from http://www.ncbi.nlm.nih.gov/Traces/wgs/. After I fill in the form, I cannot figure out how to select 'All' from the 'Show projects on page' listbox.

I tried 'find_all_inputs' with 'select' as criteria for a listbox, but nothing gets assigned to the variable. My other issue is not being able to click on the 'Download as TAB delimited list' link to output results to file.

I can complete this task using the browser, but then I don't learn. Below are the few lines of code I have so far. Your advice will be a great help.

many thanks!

#!/usr/local/bin/perl use strict; use warnings; use autodie qw/ open close /; use 5.012; use WWW::Mechanize; # create WWW::Mechanize object # autocheck 1 checks each request to ensure it was successful my $browser = WWW::Mechanize->new( autocheck => [1] ); # retrieve page $browser->get( 'http://www.ncbi.nlm.nih.gov/Traces/wgs/' ); #select form to fill based on mech-dump output $browser->form_number(1); # fill field 'term' with name of species $browser->field( 'term', 'Escherichia' ); # click apply button $browser->submit( 'Apply' ); my $url = $browser->uri; # launch browser to test url #system( 'firefox', $url ); my @inputs = $browser->find_all_inputs( type => 'select' ); say @inputs;

Replies are listed 'Best First'.
Re: stuck with WWW::Mechanize drop down list
by spazm (Monk) on Jun 02, 2012 at 00:29 UTC
    The dropdown selector uses javascript to reload the page. It's dorky:
    <select id="_size" name="size" onchange="var s=sURL + '&size=' + this.value; document.location.href=s"><option value="all">All</option><option value="50">50</option><option value="100">100</option></select>
    
    We can simulate this by adding an "&size=all" to the url. We'll do this by setting an extra field entry:
    $browser->field( 'size', 'all' );
    Example:
    #!env perl use strict; use warnings; use autodie qw/ open close /; use 5.012; use WWW::Mechanize; # create WWW::Mechanize object # autocheck 1 checks each request to ensure it was successful my $browser = WWW::Mechanize->new( autocheck => [1] ); # retrieve page $browser->get('http://www.ncbi.nlm.nih.gov/Traces/wgs/'); #select form to fill based on mech-dump output $browser->form_number(1); # fill field 'term' with name of species $browser->field( 'term', 'Escherichia' ); $browser->field( 'size', 'all' ); # click apply button $browser->submit('Apply'); my $url = $browser->uri; print "url: $url\n"; # launch browser to test url #system( 'firefox', $url ); print $browser->content();
      Now that you have the full list, you'd like to follow the link for the "Download as TAB delimited list". In your browser, following the link will lead to a saved file. In the mech, this will be just more content.

      If you want to be clever, you can get the filename from the LWP's HTTP::Response and use it as a filename to dump the file.

      $browser->follow_link( text_regex => qr/Download as TAB/i ); print $browser->content(); # prints TAB delimited file to STDOUT
      $browser->follow_link( text_regex => qr/Download as TAB/i ); if ( my $filename = $browser->res->filename ) { die "file already exists [$filename]" if -e $filename; print STDERR "Saving downloaded file to [$filename]\n"; open my $fh, ">", $filename; print $fh $browser->content; close $fh; }
      #!env perl use strict; use warnings; use autodie qw/ open close /; use 5.012; use WWW::Mechanize; # create WWW::Mechanize object # autocheck 1 checks each request to ensure it was successful my $browser = WWW::Mechanize->new( autocheck => [1] ); # retrieve page $browser->get('http://www.ncbi.nlm.nih.gov/Traces/wgs/'); #select form to fill based on mech-dump output $browser->form_number(1); # fill field 'term' with name of species $browser->field( 'term', 'Escherichia' ); $browser->field( 'size', 'all' ); # click apply button $browser->submit('Apply'); my $url = $browser->uri; print "url: $url\n"; $browser->follow_link( text_regex => qr/Download as TAB/i ); #print $browser->content(); # prints TAB delimited file to STDOUT if ( my $filename = $browser->res->filename ) { die "file already exists [$filename]" if -e $filename; print STDERR "Saving downloaded file to [$filename]\n"; open my $fh, ">", $filename; print $fh $browser->content; close $fh; }

        Spazm, thanks much, especially for the explanations!

        Question. If mech-dump doesn't output content of drop down lists, do I always need to look at the page source and, if so, then add the selection as a 'field' entry?

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://973851]
Approved by Eliya
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (2)
As of 2024-03-19 06:05 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found