http://www.perlmonks.org?node_id=1000903

rickkar has asked for the wisdom of the Perl Monks concerning the following question:

i really just need some basic s/w engring architecting guidance... when i say 'architecting' -- i mean pseudo-code/stmts i can go after in the Perl book i have... and hopefully online examples...

i'm using Perl and i'm trying to parse Medline/Pubmed file paths on a Unix system in order to finally pass the PMID from each path to a pmid2doi conversion website < http://www.pmid2doi.org/ > ...

the structure of each link is of the form... /xxxxx/xxxxx/xxxxx/xxxxx/xxxxx/UNC00000000000042/00223468/v45i3/S0022346809003820

--ls

18507872 main.pdf main.raw main.xml

where: 00223468 <-- this is the PMID

so far, in Perl, i've got something that looks like this...

#!/bin/perl use strict; use warnings; use LWP::Simple; # this is bash-like implementation of what i'm trying to do for doi in `find . -name "*.xml" | awk -F\/ '{print $2}' ` #this extr +acts the PMID do echo $doi wget http://www.pmid2doi.org/ done

the website < http://www.pmid2doi.org/ > requires inputting the PMID in order to get back the DOI...

this is what i need to get running in Perl... and i need a little help in architecting & implementing this...

given your development background, i'm grateful for any insights... or recommended sites for generating regular expressions in Perl...

thanks very much!

Additionally,

I see that http://www.pmid2doi.org/ says the REST API expects the http://www.pmid2doi.org/ PLUS the DOI value.

So I need to find some example Perl code that gets a REST value from a URL.

In REST I just prepare the URL as specified and then the returned result should be the value I want.

Replies are listed 'Best First'.
Re: ... architecting & implementing help w/ Perl...
by aitap (Curate) on Oct 25, 2012 at 17:16 UTC
    • Use File::Find instead of `find`
    • Use split to, well, split strings by the "/" character and get the second piece: my $pmid = (split "/",$string,3)[1]
    • use LWP::Simple to make HTTP POST requests or search CPAN for REST or SOAP API modules to interact with that site
    Sorry if my advice was wrong.

      something like this...?

      use File::Find; my $client = REST::Client->new( $an_url ); File::Find::find( sub { return unless m/\.xml$/; carp "Could not open $File::Find::name!" unless open( my $fh, '<', $File::Find::name ) ; my $doi; while ( <$fh> ) { next unless ( $doi ) = m{[^/]*/([^/]*)}; $client->GET( join( '/', $base, $doi )); do_stuff_with_content( $client->responseContent ); } close $fh; } => '.' );

        Yes, looks correct.

        You may want to use some XML parser (XML::Twig, for example) to search for data in the XML files if you are not completely sure that internal representation of XML data will not change.

        You can also use File::Slurp and read files requiring less lines of code.

        Sorry if my advice was wrong.

        i'm able to refine the problem...

        Statement of the Problem: parse Medline/Pubmed file paths on a Unix system in order to finally pass the PMID from each path to a pmid2doi conversion website < http://www.pmid2doi.org/ > ... and output companion DOIs…

        (1) parse this link and fetch the pmid; "/xxxxx/xxxxx/xxxxx/xxxxx/xxxxx/UNC00000000000042/00223468/v45i3/S0022346809003820";

        (2) submit a query to http://www.pmid2doi.org/, fetch the return contents and parse the DOI value.

        If you simply point your browser to: http://www.pmid2doi.org/rest/json/doi/18507872

        then your browser will display the result in the form: {"pmid":18507872,"doi":"10.1186/gb-2008-9-5-r89"}

        and then you need to parse this JSON format.

        Examples of how to do that are at: http://beerpla.net/2008/03/27/parsing-json-in-perl-by-example-southparkstudioscom-south-park-episodes/

        #!/usr/local/bin/perl use strict; use warnings; use 5.010; use LWP::Simple; # Fetch line from <DATA> while ( <DATA> ) { # PMID is an 8-digit string, surrounded by "/" and "/" my $pmid = $1 if ( /\/(\d{8})\// ); # Query pmid in http://www.pmid2doi.org/ my $ret = get("http://www.pmid2doi.org/rest/json/doi/$pmid"); unless (defined $ret) { warn "Failed to get doi for '$pmid': $!\n"; next; } # Parse query result, which would be like: # {"pmid":18507872,"doi":"10.1186/gb-2008-9-5-r89"} if ( $ret =~ /"doi":"(.*?)"}/ ) { my $doi = $1; # Output say $pmid, "\t=>\t", $doi; } else { say "doi not found in '$ret'"; } } exit 0; __DATA__ /xxxxx/xxxxx/xxxxx/xxxxx/xxxxx/UNC00000000000042/00223468/v45i3/S00223 +46809003820

        i'd appreciate any critiques/insights -- thx!

Re: ... architecting & implementing help w/ Perl...
by Anonymous Monk on Oct 26, 2012 at 12:52 UTC
    Also, before you do anything else, search http://search.cpan.org for both terms: Medline and PubMed.

    In this case, no matter what-in-particular you are doing, you can be certain that you are not the first person to have done it. Always start a project with a careful search of "prior art."