http://www.perlmonks.org?node_id=642285

logan has asked for the wisdom of the Perl Monks concerning the following question:

Greetings Monks. I'm trying to parse a chunk of XML code and the problem has become significantly more complex than I'm used to. I am requesting an xml page that describes one or more ads. There can be any number of ads returned, and any number of ads of a specific type. I am OK when there is only one ad of a given type, but multiple ads of a given type is problematic. Here is an example of the xml returned from a request for one Preroll, 3 Midroll, and one Postroll:
- <AdXML> - <Preroll> <Creative>Preroll_30sec</Creative> <CompanionId>N/A</CompanionId> <Impression>TBD</Impression> <Completion>http://192.168.0.1:80/foo/bar</Completion> <TrackingId>null:414</TrackingId> <Length>4</Length> </Preroll> - <Postroll> <Creative>Postroll_60sec</Creative> <CompanionId>N/A</CompanionId> <Impression>TBD</Impression> <Completion>http://192.168.0.1:80/foo/bar</Completion> <TrackingId>null:418</TrackingId> <Length>6</Length> </Postroll> - <Midroll> <Creative>Midroll_45sec_3</Creative> <CompanionId>N/A</CompanionId> <Impression>TBD</Impression> <Completion>http://192.168.0.1:80/foo/bar</Completion> <TrackingId>null:417</TrackingId> <Length>5</Length> </Midroll> - <Midroll> <Creative>Midroll_45sec_1</Creative> <CompanionId>N/A</CompanionId> <Impression>TBD</Impression> <Completion>http://192.168.0.1:80/foo/bar</Completion> <TrackingId>null:415</TrackingId> <Length>5</Length> </Midroll> - <Midroll> <Creative>Midroll_45sec_2</Creative> <CompanionId>N/A</CompanionId> <Impression>TBD</Impression> <Completion>http://192.168.0.1:80/foo/bar</Completion> <TrackingId>null:416</TrackingId> <Length>5</Length> </Midroll> </AdXML>
Using XML::Simple, I can put this all into an object. Run through Data::Dumper, I get this:
Response Dump: $VAR1 = { 'Preroll' => { 'Length' => '4', 'TrackingId' => 'null:414', 'CompanionId' => 'N/A', 'Creative' => 'Preroll_30sec', 'Impression' => 'TBD', 'Completion' => 'http://192.168.0.1:80/foo/bar' }, 'Midroll' => [ { 'Length' => '5', 'TrackingId' => 'null:415', 'CompanionId' => 'N/A', 'Creative' => 'Midroll_45sec_1', 'Impression' => 'TBD', 'Completion' => 'http://192.168.0.1:80/foo/ba +r' }, { 'Length' => '5', 'TrackingId' => 'null:417', 'CompanionId' => 'N/A', 'Creative' => 'Midroll_45sec_3', 'Impression' => 'TBD', 'Completion' => 'http://192.168.0.1:80/foo/ba +r' }, { 'Length' => '5', 'TrackingId' => 'null:416', 'CompanionId' => 'N/A', 'Creative' => 'Midroll_45sec_2', 'Impression' => 'TBD', 'Completion' => 'http://192.168.0.1:80/foo/ba +r' } ], 'Postroll' => { 'Length' => '6', 'TrackingId' => 'null:418', 'CompanionId' => 'N/A', 'Creative' => 'Postroll_60sec', 'Impression' => 'TBD', 'Completion' => 'http://192.168.0.1:80/foo/bar +' } };
What I need to do is walk through the object so I can compare the values for the end parameters (Length, Creative, etc) for each ad with the expected values. The problems are:
  1. I won't know in advance what order the xml elements will be in. It may be Preroll, Midroll, Postroll, or it may be Midroll, Postroll, Preroll. There is no way of knowing in advance.
  2. If there is only one ad returned for a specific ad type, '*roll' will be a hash reference. If there are multiple ads returned, '*roll' will be a reference to an anonymous array of hashes. It is possible to know in advance which ad type will have multiple ads returned and how many there should be.
What I need is an algorithm that will walk the master hash reference and be smart enough to recognize whether it's encountered a simple hash or an array of hashes and act accordingly. I've tried this:
my ($self, $response) = @_; foreach my $asset_type ( keys %{$response} ) { $logger->debug("Starting asset_type $asset_type"); foreach my $asset_param ( keys %{$response->{$asset_type}} ) { $logger->debug("Top of middle FOR loop asset_param = $asset_param: + $response->{$asset_type}->{$asset_param}"); if ( exists ($response->{$asset_type}->{$asset_param}) ) { $logger->debug("\t$asset_type asset_param $asset_param exists: ( +$asset_param) = $response->{$asset_type}->{$asset_param}"); ## LINE 7 +79 } else { $logger->debug("\t$asset_type asset_param $asset_param is an arr +ay reference"); my $i = 0; while ($response->{$asset_type}[$i]) { foreach my $subkey ( keys %{$response->{$asset_type}[$i]}) { $logger->debug("\t\tTesting $asset_type asset number $i (sub +key $subkey) = ($response->{$asset_type}[$i]->{$subkey})"); } $i++; } $logger->debug("Broke innermost WHILE loop asset_param = $asset_ +param"); } $logger->debug("Bottom of middle FOR loop asset_param = $asset_par +am"); } $logger->debug("Broke middle FOR asset param loop"); }
The output is:
- Starting asset_type Preroll - Top of middle FOR loop asset_param = Length: 4 - Preroll asset_param Length exists: (Length) = 4 - Bottom of middle FOR loop asset_param = Length - Top of middle FOR loop asset_param = TrackingId: null:414 - Preroll asset_param TrackingId exists: (TrackingId) = null:414 - Bottom of middle FOR loop asset_param = TrackingId - Top of middle FOR loop asset_param = CompanionId: N/A - Preroll asset_param CompanionId exists: (CompanionId) = N/A - Bottom of middle FOR loop asset_param = CompanionId - Top of middle FOR loop asset_param = Creative: KohlFauPreroll_30sec - Preroll asset_param Creative exists: (Creative) = KohlFauPreroll_3 +0sec - Bottom of middle FOR loop asset_param = Creative - Top of middle FOR loop asset_param = Impression: TBD - Preroll asset_param Impression exists: (Impression) = TBD - Bottom of middle FOR loop asset_param = Impression - Top of middle FOR loop asset_param = Completion: http://172.24.16.84 +:8380/baapi/hics - Preroll asset_param Completion exists: (Completion) = http://172.2 +4.16.84:8380/baapi/hics - Bottom of middle FOR loop asset_param = Completion - Broke middle FOR asset param loop - Starting asset_type Midroll - Top of middle FOR loop asset_param = Length: - Midroll asset_param Length is an array reference - Testing Midroll asset number 0 (subkey Length) = (5) - Testing Midroll asset number 0 (subkey TrackingId) = (null:41 +7) - Testing Midroll asset number 0 (subkey CompanionId) = (N/A) - Testing Midroll asset number 0 (subkey Creative) = (KohlFauMi +droll_45sec_3) - Testing Midroll asset number 0 (subkey Impression) = (TBD) - Testing Midroll asset number 0 (subkey Completion) = (http:// +172.24.16.84:8380/baapi/hics) - Testing Midroll asset number 1 (subkey Length) = (5) - Testing Midroll asset number 1 (subkey TrackingId) = (null:41 +5) - Testing Midroll asset number 1 (subkey CompanionId) = (N/A) - Testing Midroll asset number 1 (subkey Creative) = (KohlFauMi +droll_45sec_1) - Testing Midroll asset number 1 (subkey Impression) = (TBD) - Testing Midroll asset number 1 (subkey Completion) = (http:// +172.24.16.84:8380/baapi/hics) - Testing Midroll asset number 2 (subkey Length) = (5) - Testing Midroll asset number 2 (subkey TrackingId) = (null:41 +6) - Testing Midroll asset number 2 (subkey CompanionId) = (N/A) - Testing Midroll asset number 2 (subkey Creative) = (KohlFauMi +droll_45sec_2) - Testing Midroll asset number 2 (subkey Impression) = (TBD) - Testing Midroll asset number 2 (subkey Completion) = (http:// +172.24.16.84:8380/baapi/hics) - Broke innermost WHILE loop asset_param = Length - Bottom of middle FOR loop asset_param = Length
At that point the program dies with the error "Bad index while coercing array into hash at OO_HttpInterfaceTest.pm line 779." Line 779 in this case is: $logger->debug("\t$asset_type asset_param $asset_param exists: ($asset_param) = $response->{$asset_type}->{$asset_param}");. If I remove the logging statement, the code chokes on line 780 with the same error, which leads me to suspect that the actual error is with the statement "$response->{$asset_type}->{$asset_param}"

The break is happening after the Midroll section is evaluated. The value for 'Completion' is displayed at which point the loop should exit. What seems to be happening is the code tests if $response->{$asset_type}->{$asset_param} exists, finds that it doesn't, and exits rather than going to the else condition. I have no idea why it only does this when transitioning from walking the anonymous array back to a normal hash.

I've been on this for most of the day. Please help! And if there's some vastly easier/less complex way to do this, I'm all ears.

Thanks,

-Logan
"What do I want? I'm an American. I want more."