Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re^2: stuck with WWW::Mechanize drop down list

by spazm (Monk)
on Jun 02, 2012 at 01:02 UTC ( #973903=note: print w/ replies, xml ) Need Help??


in reply to Re: stuck with WWW::Mechanize drop down list
in thread stuck with WWW::Mechanize drop down list

Now that you have the full list, you'd like to follow the link for the "Download as TAB delimited list". In your browser, following the link will lead to a saved file. In the mech, this will be just more content.

If you want to be clever, you can get the filename from the LWP's HTTP::Response and use it as a filename to dump the file.

$browser->follow_link( text_regex => qr/Download as TAB/i ); print $browser->content(); # prints TAB delimited file to STDOUT
$browser->follow_link( text_regex => qr/Download as TAB/i ); if ( my $filename = $browser->res->filename ) { die "file already exists [$filename]" if -e $filename; print STDERR "Saving downloaded file to [$filename]\n"; open my $fh, ">", $filename; print $fh $browser->content; close $fh; }
#!env perl use strict; use warnings; use autodie qw/ open close /; use 5.012; use WWW::Mechanize; # create WWW::Mechanize object # autocheck 1 checks each request to ensure it was successful my $browser = WWW::Mechanize->new( autocheck => [1] ); # retrieve page $browser->get('http://www.ncbi.nlm.nih.gov/Traces/wgs/'); #select form to fill based on mech-dump output $browser->form_number(1); # fill field 'term' with name of species $browser->field( 'term', 'Escherichia' ); $browser->field( 'size', 'all' ); # click apply button $browser->submit('Apply'); my $url = $browser->uri; print "url: $url\n"; $browser->follow_link( text_regex => qr/Download as TAB/i ); #print $browser->content(); # prints TAB delimited file to STDOUT if ( my $filename = $browser->res->filename ) { die "file already exists [$filename]" if -e $filename; print STDERR "Saving downloaded file to [$filename]\n"; open my $fh, ">", $filename; print $fh $browser->content; close $fh; }


Comment on Re^2: stuck with WWW::Mechanize drop down list
Select or Download Code
Re^3: stuck with WWW::Mechanize drop down list
by abualiga (Scribe) on Jun 02, 2012 at 03:58 UTC

    Spazm, thanks much, especially for the explanations!

    Question. If mech-dump doesn't output content of drop down lists, do I always need to look at the page source and, if so, then add the selection as a 'field' entry?

      I was just about to suggest mech-dump, good that you are already using it!

      Mechanize will only return form elements that are within <form></form> elements.

      The "All" dropdown is not within a set of form tags, it directly triggers javascript to reload the page. In cases like this you just have to figure out what the script is doing and duplicate. Possibly just by inspecting the request URL submitted by the browser.

      This is an area where scraping pages becomes tedious and tricky.

        Thanks again!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://973903]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (9)
As of 2014-09-01 12:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite cookbook is:










    Results (9 votes), past polls