Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: ... architecting & implementing help w/ Perl...

by rickkar (Initiate)
on Oct 25, 2012 at 18:04 UTC ( #1000913=note: print w/ replies, xml ) Need Help??


in reply to Re: ... architecting & implementing help w/ Perl...
in thread ... architecting & implementing help w/ Perl...

something like this...?

use File::Find; my $client = REST::Client->new( $an_url ); File::Find::find( sub { return unless m/\.xml$/; carp "Could not open $File::Find::name!" unless open( my $fh, '<', $File::Find::name ) ; my $doi; while ( <$fh> ) { next unless ( $doi ) = m{[^/]*/([^/]*)}; $client->GET( join( '/', $base, $doi )); do_stuff_with_content( $client->responseContent ); } close $fh; } => '.' );


Comment on Re^2: ... architecting & implementing help w/ Perl...
Download Code
Replies are listed 'Best First'.
Re^3: ... architecting & implementing help w/ Perl...
by aitap (Deacon) on Oct 25, 2012 at 18:16 UTC

    Yes, looks correct.

    You may want to use some XML parser (XML::Twig, for example) to search for data in the XML files if you are not completely sure that internal representation of XML data will not change.

    You can also use File::Slurp and read files requiring less lines of code.

    Sorry if my advice was wrong.
Re^3: ... architecting & implementing help w/ Perl...
by rickkar (Initiate) on Oct 26, 2012 at 17:04 UTC

    i'm able to refine the problem...

    Statement of the Problem: parse Medline/Pubmed file paths on a Unix system in order to finally pass the PMID from each path to a pmid2doi conversion website < http://www.pmid2doi.org/ > ... and output companion DOIs…

    (1) parse this link and fetch the pmid; "/xxxxx/xxxxx/xxxxx/xxxxx/xxxxx/UNC00000000000042/00223468/v45i3/S0022346809003820";

    (2) submit a query to http://www.pmid2doi.org/, fetch the return contents and parse the DOI value.

    If you simply point your browser to: http://www.pmid2doi.org/rest/json/doi/18507872

    then your browser will display the result in the form: {"pmid":18507872,"doi":"10.1186/gb-2008-9-5-r89"}

    and then you need to parse this JSON format.

    Examples of how to do that are at: http://beerpla.net/2008/03/27/parsing-json-in-perl-by-example-southparkstudioscom-south-park-episodes/

    #!/usr/local/bin/perl use strict; use warnings; use 5.010; use LWP::Simple; # Fetch line from <DATA> while ( <DATA> ) { # PMID is an 8-digit string, surrounded by "/" and "/" my $pmid = $1 if ( /\/(\d{8})\// ); # Query pmid in http://www.pmid2doi.org/ my $ret = get("http://www.pmid2doi.org/rest/json/doi/$pmid"); unless (defined $ret) { warn "Failed to get doi for '$pmid': $!\n"; next; } # Parse query result, which would be like: # {"pmid":18507872,"doi":"10.1186/gb-2008-9-5-r89"} if ( $ret =~ /"doi":"(.*?)"}/ ) { my $doi = $1; # Output say $pmid, "\t=>\t", $doi; } else { say "doi not found in '$ret'"; } } exit 0; __DATA__ /xxxxx/xxxxx/xxxxx/xxxxx/xxxxx/UNC00000000000042/00223468/v45i3/S00223 +46809003820

    i'd appreciate any critiques/insights -- thx!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1000913]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (11)
As of 2015-07-28 17:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    The top three priorities of my open tasks are (in descending order of likelihood to be worked on) ...









    Results (258 votes), past polls