http://www.perlmonks.org?node_id=1222269

nysus has asked for the wisdom of the Perl Monks concerning the following question:

I often find myself trying to parse json api responses. I typically do a simple data dump of the result to get an understanding of how the data is structured. This is fine for easily digestible records but is frustrating when there is a lot of data accompanying the data structures. It's also difficult if the data is somewhat inconsistent where, for example, the email address record is missing if there is no email address.

So I set out to write a quick and dirty tool for generating a report on the data structure so I can quickly identify all the fields provided by the api response. This is what I have so far:

#! /usr/bin/env perl use strict; use warnings; # test data my @array = ({fname => 'bob', last_name => 'smith' }, {fname => 'tony', last_name => 'jones', age => 23, kids => [ {first_name => 'cheryl', middle_name => 'karen', age => 23 }, {name => 'jimmy', age => 17 } ] }, {fname => 'janet', last_name => 'marcos', occupation => { title => 'trucker', years_on_job => 12} }, {fname => 'Marge', last_name => 'Keefe', kids => [ {name => 'kate', age => 7, vaccinated => 'yes'}, {name => 'kim', age => 5} ] }); my $out .= "var "; my $out .= "var "; my %giant_hash; foreach my $array (@array) { %giant_hash = (%giant_hash, %$array); } my $s_values = 1; introspect(\%giant_hash); my $nest_level = 0; # recursive function that traverses the data structure sub introspect { my $data = shift; my $type = gtype ($data); if ($type eq 'ARRAY') { $nest_level++; $out .= "is an ARRAY with " . glen(@$data) . " elements:"; my $count = 0; foreach my $elem (@$data) { $out .= "\n" . ("\t" x $nest_level) . "elem $count "; introspect (ref $elem ? $elem : \$elem); $count++; } $nest_level--; } if ($type eq 'HASH') { $nest_level++; $out .= "is a HASH with " . scalar (keys %$data) . " keys"; $out .= "\n" . ("\t" x $nest_level) . "the keys are '" . join ("', + '", sort keys %$data) . "'"; my $last_key; foreach my $key (keys %$data) { $last_key = $key; $out .= "\n" . ("\t" x $nest_level) . "key '$key' "; introspect (ref $data->{$key} ? $data->{$key} : \$data->{$key}); } $nest_level--; } # our base case if ($type eq 'SCALAR') { $out .= "is a SCALAR"; if (!$s_values) { $out .= " with a value of '$$data'"; } } } print $out; print "\n"; sub glen { return scalar @_; } sub gtype { ref shift; }

This generates the following report:

var is a HASH with 5 keys the keys are 'age', 'fname', 'kids', 'last_name', 'occupation' key 'kids' is an ARRAY with 2 elements: elem 0 is a HASH with 3 keys the keys are 'age', 'name', 'vaccinated' key 'vaccinated' is a SCALAR key 'age' is a SCALAR key 'name' is a SCALAR elem 1 is a HASH with 2 keys the keys are 'age', 'name' key 'age' is a SCALAR key 'name' is a SCALAR key 'fname' is a SCALAR key 'age' is a SCALAR key 'occupation' is a HASH with 2 keys the keys are 'title', 'years_on_job' key 'title' is a SCALAR key 'years_on_job' is a SCALAR key 'last_name' is a SCALAR

As can be seen, my crude approach of merging each element into a %giant_hash, while great for data if everything within the array hashes is a hash, falls down when arrays are encountered. Arrays within newly merged data structure clobber the arrays in data structures that were already merged into the giant hash. For example, the unique fields for kids for the "Tony Jones" record doesn't show up in the report. And this approach also won't work at all if I have an array of arrays.

The second approach, which I gave up on, was a bit too mind bending for me to figure out. Basically, each element of the array of hashes gets traversed individually. The first element traversed would populate a %meta hash which would act as a reference for the other data structures in the outermost array. Each traversal of subsequent elements build upon %meta and make it more and more accurate as each leaf in the data structures get manually merged into %meta. Conceptually, the end result of %meta would look something like this:

fname => scalar, last_name => scalar, kids => aoh => { first_name => scalar, middle_name => scalar, age => s +calar, vaccinated => scalar}, occupation => hoh => {title => scalar, years_on_job => scalar}, age => scalar

any tips or hints are appreciated.

$PM = "Perl Monk's";
$MCF = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest";
$nysus = $PM . ' ' . $MCF;
Click here if you love Perl Monks