Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

How to improve introspection of an array of hashes

by nysus (Priest)
on Sep 13, 2018 at 07:44 UTC ( #1222269=perlquestion: print w/replies, xml ) Need Help??
nysus has asked for the wisdom of the Perl Monks concerning the following question:

I often find myself trying to parse json api responses. I typically do a simple data dump of the result to get an understanding of how the data is structured. This is fine for easily digestible records but is frustrating when there is a lot of data accompanying the data structures. It's also difficult if the data is somewhat inconsistent where, for example, the email address record is missing if there is no email address.

So I set out to write a quick and dirty tool for generating a report on the data structure so I can quickly identify all the fields provided by the api response. This is what I have so far:

#! /usr/bin/env perl use strict; use warnings; # test data my @array = ({fname => 'bob', last_name => 'smith' }, {fname => 'tony', last_name => 'jones', age => 23, kids => [ {first_name => 'cheryl', middle_name => 'karen', age => 23 }, {name => 'jimmy', age => 17 } ] }, {fname => 'janet', last_name => 'marcos', occupation => { title => 'trucker', years_on_job => 12} }, {fname => 'Marge', last_name => 'Keefe', kids => [ {name => 'kate', age => 7, vaccinated => 'yes'}, {name => 'kim', age => 5} ] }); my $out .= "var "; my $out .= "var "; my %giant_hash; foreach my $array (@array) { %giant_hash = (%giant_hash, %$array); } my $s_values = 1; introspect(\%giant_hash); my $nest_level = 0; # recursive function that traverses the data structure sub introspect { my $data = shift; my $type = gtype ($data); if ($type eq 'ARRAY') { $nest_level++; $out .= "is an ARRAY with " . glen(@$data) . " elements:"; my $count = 0; foreach my $elem (@$data) { $out .= "\n" . ("\t" x $nest_level) . "elem $count "; introspect (ref $elem ? $elem : \$elem); $count++; } $nest_level--; } if ($type eq 'HASH') { $nest_level++; $out .= "is a HASH with " . scalar (keys %$data) . " keys"; $out .= "\n" . ("\t" x $nest_level) . "the keys are '" . join ("', + '", sort keys %$data) . "'"; my $last_key; foreach my $key (keys %$data) { $last_key = $key; $out .= "\n" . ("\t" x $nest_level) . "key '$key' "; introspect (ref $data->{$key} ? $data->{$key} : \$data->{$key}); } $nest_level--; } # our base case if ($type eq 'SCALAR') { $out .= "is a SCALAR"; if (!$s_values) { $out .= " with a value of '$$data'"; } } } print $out; print "\n"; sub glen { return scalar @_; } sub gtype { ref shift; }

This generates the following report:

var is a HASH with 5 keys the keys are 'age', 'fname', 'kids', 'last_name', 'occupation' key 'kids' is an ARRAY with 2 elements: elem 0 is a HASH with 3 keys the keys are 'age', 'name', 'vaccinated' key 'vaccinated' is a SCALAR key 'age' is a SCALAR key 'name' is a SCALAR elem 1 is a HASH with 2 keys the keys are 'age', 'name' key 'age' is a SCALAR key 'name' is a SCALAR key 'fname' is a SCALAR key 'age' is a SCALAR key 'occupation' is a HASH with 2 keys the keys are 'title', 'years_on_job' key 'title' is a SCALAR key 'years_on_job' is a SCALAR key 'last_name' is a SCALAR

As can be seen, my crude approach of merging each element into a %giant_hash, while great for data if everything within the array hashes is a hash, falls down when arrays are encountered. Arrays within newly merged data structure clobber the arrays in data structures that were already merged into the giant hash. For example, the unique fields for kids for the "Tony Jones" record doesn't show up in the report. And this approach also won't work at all if I have an array of arrays.

The second approach, which I gave up on, was a bit too mind bending for me to figure out. Basically, each element of the array of hashes gets traversed individually. The first element traversed would populate a %meta hash which would act as a reference for the other data structures in the outermost array. Each traversal of subsequent elements build upon %meta and make it more and more accurate as each leaf in the data structures get manually merged into %meta. Conceptually, the end result of %meta would look something like this:

fname => scalar, last_name => scalar, kids => aoh => { first_name => scalar, middle_name => scalar, age => s +calar, vaccinated => scalar}, occupation => hoh => {title => scalar, years_on_job => scalar}, age => scalar

any tips or hints are appreciated.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks

Replies are listed 'Best First'.
Re: How to improve introspection of an array of hashes
by Eily (Prior) on Sep 13, 2018 at 09:09 UTC

    Nice :).

    First:

    my $out .= "var "; my $out .= "var ";
    What what? And please don't name your hashrefs $array ;-).

    Just so we're clear. Your idea will work when the keys identify uniquely the associated values, so something like this would be valid perl, but invalid input:

    (Shapes => [ { Type => 'Circle', Diameter => 2, Center => [0,1] }, { Type => 'Square', Side => 3, Pos => [4,8] }, [ {x => 1, y => 1}, {x => 3, y => 1}, {x => 4, y => 2}, {x => 2, y + => 2} ] ] );
    The values in the Shapes array can either be hashes describing the shape, or an array of points, and you need to look inside the hashes to know the type of shape and the associated members.
    Edit: actually my solution below accepts mixed ARRAY/HASH. It just melds Circle and Square together in a way that might not make much sense.

    As can be seen, my crude approach of merging each element into a %giant_hash, while great for data if everything within the array hashes is a hash, falls down when arrays are encountered.
    Actually the hash is the issue, in %giant_hash = (%giant_hash, %$array); if there are keys in %$array that are already present in %giant_hash, they will overwrite them. This means that the kids of Marge Keefe will erase the kids of Tony Jones.

    A hash in perl can only hold a single value (scalar) for each key. That value can be a reference that holds other values, but perl won't just do that on its own when you "merge" hashes, so in %giant_hash, you will only have the fname, last_name, occupation and set of kids. So you can't actually merge the hash that way before iterating over it.

    The reason you merged the hashes in the first place is that they are of the same type, so contain similar data. That's also true for the kids (basically they have a name, an age, and might be vaccinated), so you should "merge" them, and show that "kids" may contrary may contain an arbitrary number of elements of type "kid". Which would make your output look like:

    var is a HASH with 5 keys the keys are 'age', 'fname', 'kids', 'last_name', 'occupation' key 'kids' is an ARRAY containing HASHREFs: the keys are 'age', 'name', 'vaccinated' key 'vaccinated' is a SCALAR key 'age' is a SCALAR key 'name' is a SCALAR key 'fname' is a SCALAR key 'age' is a SCALAR key 'occupation' is a HASH with 2 keys the keys are 'title', 'years_on_job' key 'title' is a SCALAR key 'years_on_job' is a SCALAR key 'last_name' is a SCALAR

    Here is my attempt at solving your problem. There are of course many ways to do it, keeping the list of keys down to the current point rather than a reference to the current level might be a better way to work (you don't have to provide the output hash as a parameter), but I just went where my fingers took me :D

    use v5.14; use strict; use warnings; use Data::Dump qw( pp ); use YAML; sub introspect { my ($data, $output) = @_; if (ref $data eq 'ARRAY') { my $sub_out = ($output->{'ARRAY'} //= {}); introspect($_, $sub_out) for @{ $data }; } elsif (ref $data eq 'HASH') { my $hash_out = $output->{"HASH"} //= {}; for my $key (keys %$data) { my $sub_out = ($hash_out->{"$key"} //= {}); introspect($_, $sub_out) for $data->{$key}; } } elsif (ref $data) { $output->{ref($data).'REF'}=1; } else { $output->{SCALAR}=1; } } my @array = ({fname => 'bob', last_name => 'smith', foo => [\*main]}, {fname => 'tony', last_name => 'jones', age => 23, kids => [ {first_name => 'cheryl', middle_name => 'karen', age => 24 }, {name => 'jimmy', age => 17 } ], }, {fname => 'janet', last_name => 'marcos', foo => {}, occupation => { title => 'trucker', years_on_job => 12} }, {fname => 'Marge', last_name => 'Keefe', kids => [ {name => 'kate', age => 7, vaccinated => 'yes'}, {name => 'kim', age => 5} ] }); my %out; introspect(\@array, \%out); say pp \%out; say YAML::Dump(\%out);
    { ARRAY => { HASH => { age => { SCALAR => 1 }, fname => { SCALAR => 1 }, foo => { ARRAY => { GLOBREF => 1 }, HASH => {} }, kids => { ARRAY => { HASH => { age => { SCALAR => 1 }, first_name => { SCALAR => 1 }, middle_name => { SCALAR => 1 }, name => { SCALAR => 1 }, vaccinated => { SCALAR => 1 }, }, }, }, last_name => { SCALAR => 1 }, occupation => { HASH => { title => { SCALAR => 1 }, years_on_job => { SCALAR = +> 1 } }, }, }, }, } --- ARRAY: HASH: age: SCALAR: 1 fname: SCALAR: 1 foo: ARRAY: GLOBREF: 1 HASH: {} kids: ARRAY: HASH: age: SCALAR: 1 first_name: SCALAR: 1 middle_name: SCALAR: 1 name: SCALAR: 1 vaccinated: SCALAR: 1 last_name: SCALAR: 1 occupation: HASH: title: SCALAR: 1 years_on_job: SCALAR: 1

    Edit: you can add this case to handle things like \\\\\{};

    elsif (ref $data eq 'REF') { introspect($$data, ($output->{'REF'} //= {})); }

      Perfect! Very elegant. I will study this closely. And nice use of Dumper and yaml to do the work of formatting the output.

      Do you think this might be useful as a cpan module? I searched cpan but didn't find anything that did anything quite like this.

      $PM = "Perl Monk's";
      $MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest";
      $nysus = $PM . ' ' . $MCF;
      Click here if you love Perl Monks

        YAML is often my go-to module when I want formatted data but I'm too lazy to do it myself :). I was hoping for compacter data with YAML than Data::Dump though .But the latter has inline { SCALAR => 1 } where YAML puts it in a separate line.

        Do you think this might be useful as a cpan module?
        Maybe? It needs some tinkering though (or rewrite). Like a wrapper to hide the %output hash. And proper handling of objects: right now inspecting bless {}, 'Pony' would be indicated as 'PonyREF' and bless {}, 'ARRAY' would try to dereference the hashref as an arrayref. Oups.

        I'd still be curious to see what others might have to say about the subject. I wouldn't be surprised if there is already a data traversing module that, rather than do what you want already, let's you do it in two to three lines.

Re: How to improve introspection of an array of hashes ( rehohy )
by Anonymous Monk on Sep 14, 2018 at 03:11 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1222269]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others romping around the Monastery: (4)
As of 2018-09-23 23:27 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Eventually, "covfefe" will come to mean:













    Results (191 votes). Check out past polls.

    Notices?
    • (Sep 10, 2018 at 22:53 UTC) Welcome new users!