Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: ... architecting & implementing help w/ Perl...

by aitap (Deacon)
on Oct 25, 2012 at 17:16 UTC ( #1000905=note: print w/ replies, xml ) Need Help??


in reply to ... architecting & implementing help w/ Perl...

  • Use File::Find instead of `find`
  • Use split to, well, split strings by the "/" character and get the second piece: my $pmid = (split "/",$string,3)[1]
  • use LWP::Simple to make HTTP POST requests or search CPAN for REST or SOAP API modules to interact with that site
Sorry if my advice was wrong.


Comment on Re: ... architecting & implementing help w/ Perl...
Select or Download Code
Re^2: ... architecting & implementing help w/ Perl...
by rickkar (Initiate) on Oct 25, 2012 at 18:04 UTC

    something like this...?

    use File::Find; my $client = REST::Client->new( $an_url ); File::Find::find( sub { return unless m/\.xml$/; carp "Could not open $File::Find::name!" unless open( my $fh, '<', $File::Find::name ) ; my $doi; while ( <$fh> ) { next unless ( $doi ) = m{[^/]*/([^/]*)}; $client->GET( join( '/', $base, $doi )); do_stuff_with_content( $client->responseContent ); } close $fh; } => '.' );

      Yes, looks correct.

      You may want to use some XML parser (XML::Twig, for example) to search for data in the XML files if you are not completely sure that internal representation of XML data will not change.

      You can also use File::Slurp and read files requiring less lines of code.

      Sorry if my advice was wrong.

      i'm able to refine the problem...

      Statement of the Problem: parse Medline/Pubmed file paths on a Unix system in order to finally pass the PMID from each path to a pmid2doi conversion website < http://www.pmid2doi.org/ > ... and output companion DOIs…

      (1) parse this link and fetch the pmid; "/xxxxx/xxxxx/xxxxx/xxxxx/xxxxx/UNC00000000000042/00223468/v45i3/S0022346809003820";

      (2) submit a query to http://www.pmid2doi.org/, fetch the return contents and parse the DOI value.

      If you simply point your browser to: http://www.pmid2doi.org/rest/json/doi/18507872

      then your browser will display the result in the form: {"pmid":18507872,"doi":"10.1186/gb-2008-9-5-r89"}

      and then you need to parse this JSON format.

      Examples of how to do that are at: http://beerpla.net/2008/03/27/parsing-json-in-perl-by-example-southparkstudioscom-south-park-episodes/

      #!/usr/local/bin/perl use strict; use warnings; use 5.010; use LWP::Simple; # Fetch line from <DATA> while ( <DATA> ) { # PMID is an 8-digit string, surrounded by "/" and "/" my $pmid = $1 if ( /\/(\d{8})\// ); # Query pmid in http://www.pmid2doi.org/ my $ret = get("http://www.pmid2doi.org/rest/json/doi/$pmid"); unless (defined $ret) { warn "Failed to get doi for '$pmid': $!\n"; next; } # Parse query result, which would be like: # {"pmid":18507872,"doi":"10.1186/gb-2008-9-5-r89"} if ( $ret =~ /"doi":"(.*?)"}/ ) { my $doi = $1; # Output say $pmid, "\t=>\t", $doi; } else { say "doi not found in '$ret'"; } } exit 0; __DATA__ /xxxxx/xxxxx/xxxxx/xxxxx/xxxxx/UNC00000000000042/00223468/v45i3/S00223 +46809003820

      i'd appreciate any critiques/insights -- thx!

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://1000905]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others chilling in the Monastery: (11)
As of 2014-07-11 13:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (224 votes), past polls