in reply to Download references list in pdf format with script
Working on the assumption that the references will only find one PDF (which I'm not entirely convinced of), the following code should give you a starting point.
#!/usr/bin/env perl use strict; use warnings; use LWP::UserAgent; use URI::Escape; use File::Basename; our $VERSION = '0.001'; my $agent_name = join '/' => basename($0), $VERSION; my $query_base = 'https://duckduckgo.com/html/?q='; my $pdf_re = qr{href="([^"]+\.pdf)"}; my $ua = LWP::UserAgent->new(agent => $agent_name); while (<DATA>) { chomp; my $req = HTTP::Request->new(GET => $query_base . uri_escape($_)); $req->content_type('text/html'); my $res = $ua->request($req); if ($res->is_success) { print "Search successful.\n"; if ($res->content =~ $pdf_re) { my $pdf_url = $1; print "PDF found: $pdf_url\n"; process_pdf_url($pdf_url); } else { print "PDF not found!\n"; } } else { print $res->status_line, "\n"; } } sub process_pdf_url { my $pdf_url = shift; print "Stub - download $pdf_url,\n\trename, upload to database, et +c.\n"; return; } __DATA__ 1. Abilez O, Benharash P, Mehrotra M, Miyamoto E, Gale A, Picquet J +, Xu C, Zarins C (2006) A novel culture system shows that stem cells +can be grown in 3D and under physiologic pulsatile conditions for tis +sue engineering of vascular grafts. J Surg Res 132:170-178.
Output:
$ pm_web_search_pdf.pl Search successful. PDF found: http://med.stanford.edu/arts/arts_students/CVs/CV_abilez_09 +2007.pdf Stub - download http://med.stanford.edu/arts/arts_students/CVs/CV_abil +ez_092007.pdf, rename, upload to database, etc.
-- Ken
|
---|
Replies are listed 'Best First'. | |
---|---|
Re^2: Download references list in pdf format with script
by bitingduck (Chaplain) on Oct 26, 2012 at 03:31 UTC | |
by kcott (Archbishop) on Oct 26, 2012 at 04:06 UTC | |
by bitingduck (Chaplain) on Oct 26, 2012 at 04:23 UTC |
In Section
Seekers of Perl Wisdom