... architecting & implementing help w/ Perl...

rickkar has asked for the wisdom of the Perl Monks concerning the following question:

i really just need some basic s/w engring architecting guidance... when i say 'architecting' -- i mean pseudo-code/stmts i can go after in the Perl book i have... and hopefully online examples...

i'm using Perl and i'm trying to parse Medline/Pubmed file paths on a Unix system in order to finally pass the PMID from each path to a pmid2doi conversion website < http://www.pmid2doi.org/ > ...

the structure of each link is of the form... /xxxxx/xxxxx/xxxxx/xxxxx/xxxxx/UNC00000000000042/00223468/v45i3/S0022346809003820

--ls

18507872 main.pdf main.raw main.xml

where: 00223468 <-- this is the PMID

so far, in Perl, i've got something that looks like this...

#!/bin/perl
use strict; 
use warnings;
use LWP::Simple;

# this is bash-like implementation of what i'm trying to do
for doi in `find . -name "*.xml" | awk -F\/ '{print $2}' `  #this extr
+acts the PMID
do
        echo  $doi
        wget http://www.pmid2doi.org/
done
[download]

the website < http://www.pmid2doi.org/ > requires inputting the PMID in order to get back the DOI...

this is what i need to get running in Perl... and i need a little help in architecting & implementing this...

given your development background, i'm grateful for any insights... or recommended sites for generating regular expressions in Perl...

thanks very much!

Additionally,

I see that http://www.pmid2doi.org/ says the REST API expects the http://www.pmid2doi.org/ PLUS the DOI value.

So I need to find some example Perl code that gets a REST value from a URL.

In REST I just prepare the URL as specified and then the returned result should be the value I want.

Comment on ... architecting & implementing help w/ Perl... Download Code

Replies are listed 'Best First'.
Re: ... architecting & implementing help w/ Perl... by aitap (Curate) on Oct 25, 2012 at 17:16 UTC
Use File::Find instead of `find` Use split to, well, split strings by the "/" character and get the second piece: `my $pmid = (split "/",$string,3)[1]` use LWP::Simple to make HTTP POST requests or search CPAN for REST or SOAP API modules to interact with that site Sorry if my advice was wrong.	[reply] [d/l] [select]
Re^2: ... architecting & implementing help w/ Perl... by rickkar (Initiate) on Oct 25, 2012 at 18:04 UTC
something like this...? `use File::Find; my $client = REST::Client->new( $an_url ); File::Find::find( sub { return unless m/\.xml$/; carp "Could not open $File::Find::name!" unless open( my $fh, '<', $File::Find::name ) ; my $doi; while ( <$fh> ) { next unless ( $doi ) = m{[^/]/([^/])}; $client->GET( join( '/', $base, $doi )); do_stuff_with_content( $client->responseContent ); } close $fh; } => '.' );` [download]	[reply] [d/l]
Re^3: ... architecting & implementing help w/ Perl... by aitap (Curate) on Oct 25, 2012 at 18:16 UTC
Yes, looks correct. You may want to use some XML parser (XML::Twig, for example) to search for data in the XML files if you are not completely sure that internal representation of XML data will not change. You can also use File::Slurp and read files requiring less lines of code. Sorry if my advice was wrong.	[reply]
Re^3: ... architecting & implementing help w/ Perl... by rickkar (Initiate) on Oct 26, 2012 at 17:04 UTC
i'm able to refine the problem... Statement of the Problem: parse Medline/Pubmed file paths on a Unix system in order to finally pass the PMID from each path to a pmid2doi conversion website < http://www.pmid2doi.org/ > ... and output companion DOIs… (1) parse this link and fetch the pmid; "/xxxxx/xxxxx/xxxxx/xxxxx/xxxxx/UNC00000000000042/00223468/v45i3/S0022346809003820"; (2) submit a query to http://www.pmid2doi.org/, fetch the return contents and parse the DOI value. If you simply point your browser to: http://www.pmid2doi.org/rest/json/doi/18507872 then your browser will display the result in the form: {"pmid":18507872,"doi":"10.1186/gb-2008-9-5-r89"} and then you need to parse this JSON format. Examples of how to do that are at: http://beerpla.net/2008/03/27/parsing-json-in-perl-by-example-southparkstudioscom-south-park-episodes/ #!/usr/local/bin/perl use strict; use warnings; use 5.010; use LWP::Simple; # Fetch line from <DATA> while ( <DATA> ) { # PMID is an 8-digit string, surrounded by "/" and "/" my $pmid = $1 if ( /\/(\d{8})\// ); # Query pmid in http://www.pmid2doi.org/ my $ret = get("http://www.pmid2doi.org/rest/json/doi/$pmid"); unless (defined $ret) { warn "Failed to get doi for '$pmid': $!\n"; next; } # Parse query result, which would be like: # {"pmid":18507872,"doi":"10.1186/gb-2008-9-5-r89"} if ( $ret =~ /"doi":"(.*?)"}/ ) { my $doi = $1; # Output say $pmid, "\t=>\t", $doi; } else { say "doi not found in '$ret'"; } } exit 0; __DATA__ /xxxxx/xxxxx/xxxxx/xxxxx/xxxxx/UNC00000000000042/00223468/v45i3/S00223 +46809003820 [download] i'd appreciate any critiques/insights -- thx!	[reply] [d/l]
Re: ... architecting & implementing help w/ Perl... by Anonymous Monk on Oct 26, 2012 at 12:52 UTC
Also, before you do anything else, search http://search.cpan.org for both terms: Medline and PubMed. In this case, no matter what-in-particular you are doing, you can be certain that you are not the first person to have done it. Always start a project with a careful search of "prior art."	[reply]

Back to Seekers of Perl Wisdom