Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

JSON::Path and node iteration

by frazap (Monk)
on Mar 21, 2019 at 14:41 UTC ( [id://1231547] : perlquestion . print w/replies, xml ) Need Help??

frazap has asked for the wisdom of the Perl Monks concerning the following question:

I have the following json data
{"status":"ok","message-type":"work-list","message-version":"1.0.0","m +essage":{"facets":{},"total-results":1,"items":[{"author":[{"given":" +Brahim","family":"Oubaha","sequence":"first","affiliation":[{"name":" +Laboratory of Biology and Biotechnology of Microorganisms Faculty of +Sciences Semlalia Cadi Ayyad University Marrakech Morocco"},{"name":" +Laboratory of Interaction Plant\u2010Microorganisms, Department of Bi +ology University of Fribourg Fribourg Switzerland"}]},{"given":"Ahmed +","family":"Nafis","sequence":"additional","affiliation":[{"name":"La +boratory of Biology and Biotechnology of Microorganisms Faculty of Sc +iences Semlalia Cadi Ayyad University Marrakech Morocco"}]},{"given": +"Mohamed","family":"Baz","sequence":"additional","affiliation":[{"nam +e":"Laboratory of Biology and Biotechnology of Microorganisms Faculty + of Sciences Semlalia Cadi Ayyad University Marrakech Morocco"}]},{"g +iven":"Felix","family":"Mauch","sequence":"additional","affiliation": +[{"name":"Laboratory of Interaction Plant\u2010Microorganisms, Depart +ment of Biology University of Fribourg Fribourg Switzerland"}]},{"ORC +ID":"http:\/\/orcid.org\/0000-0001-7949-5592","authenticated-orcid":f +alse,"given":"Mustapha","family":"Barakate","sequence":"additional"," +affiliation":[{"name":"Laboratory of Biology and Biotechnology of Mic +roorganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrak +ech Morocco"},{"name":"Agrobiosciences & Fertilizers Program Univ +ersity Mohammed IV Polytechnic (UM6P) Benguerir Morocco"}]}],"title": +["The potential of antagonistic moroccan\n Streptomyces\n + isolates for the biological control of damping\u2010off di +sease of pea (\n Pisum sativum\n L.) caused by\ +n Aphanomyces euteiches"]}],"items-per-page":20,"query":{" +start-index":0,"search-terms":null}}}
That comes from the CrossRef metadata repository

I want to extract the names and affiliations with the format name, firstname; affiliation1 - affiliation2 - ...

I have the following code that works
sub unfoldauthors { my ($data_ar) = @_; return unless ($data_ar); $data_ar = $data_ar->[0]; # die Dumper $data_ar; my @paths = qw( $..family $..given $..affiliation ); my @selectors; for my $p (@paths) { push @selectors, JSON::Path->new($p); } my $aff_name = JSON::Path->new('$..name'); my $name; my @lines; my @sep = ( ", ", "; ", " " ); my $col = 0; for my $s (@selectors) { my @text = $s->values($data_ar); #print $col, " ", Dumper(@text), "\n"; my $authors_limit = 50; my $last = $authors_limit - 1; if ( @text > $authors_limit ) { @text = @text[ 0 .. $last ]; $text[$last] .= " ..." if ( $col == 1 ); } my $pos = 0; for my $name (@text) { #print "$col $name\n"; if ( $col == 2 ) { my @aff = $aff_name->values( $text[$pos] ); #print "aff:", Dumper( @aff ), "\n"; $name = join( " - ", @aff ); #print "\$name $name\n"; } $lines[ $pos++ ] .= $name . $sep[$col]; } $col++; } return \@lines; }

I first iterate over the nodes at the "upper level" family, given, affiliation, and for the affiliation array ref, I extract the name values.

My question: could I have done this more directly in one step ?

Thanks

FranÁois

Here is a complete working example

use strict; use warnings; use open qw<:std :encoding(UTF-8) >; use Data::Dumper; use REST::Client::CrossRef; use Log::Any::Adapter( 'File', './log_auth.txt', "log_level" => "info" + ); use JSON::Path; my $cr = REST::Client::CrossRef->new( mailto => 'd...h', spit_raw_data => 0, add_end_flag => 1, json_path => [ ['$.title'], ['$.author'], ], json_path_callback => { '$.author' => \&unfoldauthors }, ); $cr->init_cache( { BasePath => "C:\\Windows\\Temp\\perl", NoUpdate => 15 * 60, verbose => 0 } ); sub unfoldauthors { my ($data_ar) = @_; return unless ($data_ar); $data_ar = $data_ar->[0]; my @paths = qw( $..family $..given $..affiliation ); my @selectors; for my $p (@paths) { push @selectors, JSON::Path->new($p); } my $aff_name = JSON::Path->new('$..name'); my $name; my @lines; my @sep = ( ", ", "; ", " " ); my $col = 0; for my $s (@selectors) { my @text = $s->values($data_ar); my $authors_limit = 50; my $last = $authors_limit - 1; if ( @text > $authors_limit ) { @text = @text[ 0 .. $last ]; $text[$last] .= " ..." if ( $col == 1 ); } my $pos = 0; for my $name (@text) { print "$col $name\n"; if ( $col == 2 ) { my @aff = $aff_name->values( $text[$pos] ); $name = join( " - ", @aff ); } $lines[ $pos++ ] .= $name . $sep[$col]; } $col++; } return \@lines; } $cr->init_cache( { BasePath => "C:\\Windows\\Temp\\perl", NoUpdate => 15 * 60, verbose => 0 } ); my $select = "author,title"; while (<DATA>) { chomp; my $data = $cr->works_from_doi( $_, { 'has-affiliation' => 'true' }, $sel +ect ); next unless ($data); for my $row (@$data) { print "\n" unless ($row); while ( my ( $f, $v ) = each %$row ) { if ( $f eq '$.title' ) { print "***$v\n\n"; } else { print "$v \n"; } } } } __DATA__ 10.1111/jph.12775

Replies are listed 'Best First'.
Re: JSON::Path and node iteration
by haukex (Archbishop) on Mar 21, 2019 at 15:31 UTC

    You don't say what your expected output for that input is, but based on running your code against your sample data, it seems that you want only the first author ($data_ar = $data_ar->[0]), with the return value of unfoldauthors in this format (Perl):

    ["Oubaha, Brahim; Laboratory of Biology and Biotechnology of Microorga +nisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech Mo +rocco - Laboratory of Interaction Plant\x{2010}Microorganisms, Depart +ment of Biology University of Fribourg Fribourg Switzerland "]

    Your stated requirement seems fairly straightforward, and so I'm not really sure what all the code in for my $s (@selectors) is doing - maybe you could explain that?

    Here, I've written unfoldauthors2 to give you just the first author, and unfoldauthors3 to give you all authors:

    use warnings; use strict; use Data::Dump; use JSON::Path; use JSON; my $data = decode_json(<<'ENDJSON'); {"status":"ok","message-type":"work-list","message-version":"1.0.0", "message":{"facets":{},"total-results":1,"items": [ {"author":[ {"given":"Brahim","family":"Oubaha","sequence":"first","affili +ation":[ {"name":"Laboratory of Biology and Biotechnology of Microo +rganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech + Morocco"}, {"name":"Laboratory of Interaction Plant\u2010Microorganis +ms, Department of Biology University of Fribourg Fribourg Switzerland +"} ]}, {"given":"Ahmed","family":"Nafis","sequence":"additional","aff +iliation":[ {"name":"Laboratory of Biology and Biotechnology of Microo +rganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech + Morocco"} ]}, {"given":"Mohamed","family":"Baz","sequence":"additional","aff +iliation":[ {"name":"Laboratory of Biology and Biotechnology of Microo +rganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech + Morocco"} ]}, {"given":"Felix","family":"Mauch","sequence":"additional","aff +iliation":[ {"name":"Laboratory of Interaction Plant\u2010Microorganis +ms, Department of Biology University of Fribourg Fribourg Switzerland +"}]}, {"given":"Mustapha","family":"Barakate","ORCID":"http:\/\/orci +d.org\/0000-0001-7949-5592","authenticated-orcid":false,"sequence":"a +dditional","affiliation":[ {"name":"Laboratory of Biology and Biotechnology of Microo +rganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech + Morocco"}, {"name":"Agrobiosciences &amp; Fertilizers Program Univers +ity Mohammed IV Polytechnic (UM6P) Benguerir Morocco"}]} ],"title":["The potential of antagonistic moroccan\n St +reptomyces\n isolates for the biological control of dampin +g\u2010off disease of pea (\n Pisum sativum\n L +.) caused by\n Aphanomyces euteiches"]} ], "items-per-page":20,"query":{"start-index":0,"search-terms":null}}} ENDJSON dd unfoldauthors($data->{message}{items}[0]{author}); dd unfoldauthors2($data->{message}{items}[0]{author}); dd unfoldauthors3($data->{message}{items}[0]{author}); sub unfoldauthors { my ($data_ar) = @_; return unless ($data_ar); $data_ar = $data_ar->[0]; my @paths = qw( $..family $..given $..affiliation ); my @selectors; for my $p (@paths) { push @selectors, JSON::Path->new($p); } my $aff_name = JSON::Path->new('$..name'); my $name; my @lines; my @sep = ( ", ", "; ", " " ); my $col = 0; for my $s (@selectors) { my @text = $s->values($data_ar); my $authors_limit = 50; my $last = $authors_limit - 1; if ( @text > $authors_limit ) { @text = @text[ 0 .. $last ]; $text[$last] .= " ..." if ( $col == 1 ); } my $pos = 0; for my $name (@text) { #print "$col $name\n"; if ( $col == 2 ) { my @aff = $aff_name->values( $text[$pos] ); $name = join( " - ", @aff ); } $lines[ $pos++ ] .= $name . $sep[$col]; } $col++; } return \@lines; } sub unfoldauthors2 { my ($authors) = @_; return [ $authors->[0]{family}.', '.$authors->[0]{given}.'; '. join(' - ', map {$_->{name}} @{$authors->[0]{affiliation}}) ]; } sub unfoldauthors3 { my ($authors) = @_; return [ map { $_->{family}.', '.$_->{given}.'; '.join(' - ', map {$_->{name}} @{ $_->{affiliation} }) } @$authors ]; } __END__ [ "Oubaha, Brahim; Laboratory of Biology and Biotechnology of Microorg +anisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech M +orocco - Laboratory of Interaction Plant\x{2010}Microorganisms, Depar +tment of Biology University of Fribourg Fribourg Switzerland ", ] [ "Oubaha, Brahim; Laboratory of Biology and Biotechnology of Microorg +anisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech M +orocco - Laboratory of Interaction Plant\x{2010}Microorganisms, Depar +tment of Biology University of Fribourg Fribourg Switzerland", ] [ "Oubaha, Brahim; Laboratory of Biology and Biotechnology of Microorg +anisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech M +orocco - Laboratory of Interaction Plant\x{2010}Microorganisms, Depar +tment of Biology University of Fribourg Fribourg Switzerland", "Nafis, Ahmed; Laboratory of Biology and Biotechnology of Microorgan +isms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech Mor +occo", "Baz, Mohamed; Laboratory of Biology and Biotechnology of Microorgan +isms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech Mor +occo", "Mauch, Felix; Laboratory of Interaction Plant\x{2010}Microorganisms +, Department of Biology University of Fribourg Fribourg Switzerland", "Barakate, Mustapha; Laboratory of Biology and Biotechnology of Micr +oorganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrake +ch Morocco - Agrobiosciences &amp; Fertilizers Program University Moh +ammed IV Polytechnic (UM6P) Benguerir Morocco", ]

      Thanks for helping!

      I want to extract the names and affiliations with the format name, firstname; affiliation1 - affiliation2 - ...

      I wanted all the authors, not the first. The unfoldauthors code take only the first 50, but I should have removed these line. I think that my values are in the first element of the array because of the selector $.author here

      my $cr = REST::Client::CrossRef->new( mailto => 'd...h', spit_raw_data => 0, add_end_flag => 1, json_path => [ ['$.title'], ['$.author'], ], json_path_callback => { '$.author' => \&unfoldauthors }, );

      The unfoldauthors function is a callback pass to the $cr object. The two json_path arg are used to extract data from the whole json hashref. The data extracted with the second argument is given to the callback function.

      This is the output (from the complete example) I'm having now, which is ok, but I'm asking if my callback function could have been made simpler:

      ***The potential of antagonistic moroccan Streptomyces isolates for the biological control of damping‘«…off disea +se of pea ( Pisum sativum L.) caused by Aphanomyces euteiches Oubaha, Brahim; Laboratory of Biology and Biotechnology of Microorgani +sms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech Moro +cco - Laboratory of Interaction Plant‘«…Microorganisms, Department of + Biology University of Fribourg Fribourg Switzerland Nafis, Ahmed; Laboratory of Biology and Biotechnology of Microorganism +s Faculty of Sciences Semlalia Cadi Ayyad University Marrakech Morocc +o Baz, Mohamed; Laboratory of Biology and Biotechnology of Microorganism +s Faculty of Sciences Semlalia Cadi Ayyad University Marrakech Morocc +o Mauch, Felix; Laboratory of Interaction Plant‘«…Microorganisms, Depart +ment of Biology University of Fribourg Fribourg Switzerland Barakate, Mustapha; Laboratory of Biology and Biotechnology of Microor +ganisms Faculty of Sciences Semlalia Cadi Ayyad University Marrakech +Morocco - Agrobiosciences &amp; Fertilizers Program University Mohamm +ed IV Polytechnic (UM6P) Benguerir Morocco
        I think that my values are in the first element of the array because of the selector $.author here

        Ah yes, I didn't play around with JSON::Path because I haven't used that yet (and it seemed like it might be a little overkill in this case). Anyway, I think then my unfoldauthors3 should do what you want? (Update: The same $data_ar = $data_ar->[0]; step will probably be necessary with $authors.)

        Update 2: I took a look at the source of REST::Client::CrossRef, and it seems like that's the code that wraps the argument of the callback in an extra arrayref. In unfoldauthors3, you could probably just say @{$authors->[0]} instead of @$authors.