Working on the assumption that the references will only find one PDF (which I'm not entirely convinced of), the following code should give you a starting point.
#!/usr/bin/env perl
use strict;
use warnings;
use LWP::UserAgent;
use URI::Escape;
use File::Basename;
our $VERSION = '0.001';
my $agent_name = join '/' => basename($0), $VERSION;
my $query_base = 'https://duckduckgo.com/html/?q=';
my $pdf_re = qr{href="([^"]+\.pdf)"};
my $ua = LWP::UserAgent->new(agent => $agent_name);
while (<DATA>) {
chomp;
my $req = HTTP::Request->new(GET => $query_base . uri_escape($_));
$req->content_type('text/html');
my $res = $ua->request($req);
if ($res->is_success) {
print "Search successful.\n";
if ($res->content =~ $pdf_re) {
my $pdf_url = $1;
print "PDF found: $pdf_url\n";
process_pdf_url($pdf_url);
}
else {
print "PDF not found!\n";
}
}
else {
print $res->status_line, "\n";
}
}
sub process_pdf_url {
my $pdf_url = shift;
print "Stub - download $pdf_url,\n\trename, upload to database, et
+c.\n";
return;
}
__DATA__
1. Abilez O, Benharash P, Mehrotra M, Miyamoto E, Gale A, Picquet J
+, Xu C, Zarins C (2006) A novel culture system shows that stem cells
+can be grown in 3D and under physiologic pulsatile conditions for tis
+sue engineering of vascular grafts. J Surg Res 132:170-178.
Output:
$ pm_web_search_pdf.pl
Search successful.
PDF found: http://med.stanford.edu/arts/arts_students/CVs/CV_abilez_09
+2007.pdf
Stub - download http://med.stanford.edu/arts/arts_students/CVs/CV_abil
+ez_092007.pdf,
rename, upload to database, etc.
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.