http://www.perlmonks.org?node_id=553516

Hammy has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I have an XML stream that I am trying to parse and I am having a heck of a time properly handling the resulting array. Any suggestions would be greatly appreciated. Below is the XML:
<DOCUMENT NAME="SucceedWebGuideMaster" LANGUAGE="Default" FORMAT=" +2.0"> <Response> <Participant UID="JANEDOE" PID="8675309" FNAME="Jane" PROG +RAM_COUNT=3 DOB="03/26/76" /> <Program NAME="Balance" INSTRCD="blna"> <Status> <StatusCD>2</StatusCD> <StatusText>Newsletter 1</StatusText> <StatusDesc>nl1 has been published</StatusDesc> <StatusDate>04/20/05</StatusDate> <StatusEndDate> </StatusEndDate> </Status> <Status> <StatusCD>1</StatusCD> <StatusText>Plan</StatusText> <StatusDesc>mas has been published</StatusDesc> <StatusDate>04/20/05</StatusDate> <StatusEndDate>04/26/06</StatusEndDate> </Status> </Program> <Program NAME="Breathe" INSTRCD="blna"> <Status> <StatusCD>2</StatusCD> <StatusText>Newsletter 1</StatusText> <StatusDesc>nl1 has been published</StatusDesc> <StatusDate>04/20/05</StatusDate> <StatusEndDate> </StatusEndDate> </Status> <Status> <StatusCD>1</StatusCD> <StatusText>Plan</StatusText> <StatusDesc>mas has been published</StatusDesc> <StatusDate>04/20/05</StatusDate> <StatusEndDate>04/26/06</StatusEndDate> </Status> </Program> </Response> </DOCUMENT>
What I want to do is find out how many programs and how many statuses and walk through them. I used XMLin to break down this stream. I used the forcearray option to make sure I do not have to check for multiples (always an array). I put the following code together, but it does not work as I would have thought it would. What could I be missing?
use XML::Simple; use Data::Dumper; $XMLref = XMLin($xml_from_hm, suppressempty => '', forcearray=>1); if (ref ($XMLref->{Response}->[0]->{Program}->[0])) { @program_array = $XMLref->{Response}->[0]->{Program}; # Walk through the programs foreach $programs (@program_array) { @status_array = $programs->{status}; $one_program_name = $programs->{NAME}->[0]; # Walk through the statuses within each program foreach $status (@status_array) { $hold_desc = $status->{StatusDesc}->[0]; $hold_start = $status->{StatusDate}->[0]; $hold_end = $status->{StatusEndDate}->[0]; # a lot of other processing here } } }

2006-06-05 Retitled by planetscape, as per Monastery guidelines

( keep:2 edit:16 reap:0 )

Original title: 'array confusion'

Replies are listed 'Best First'.
Re: Problem handling array from XML stream
by bobf (Monsignor) on Jun 05, 2006 at 03:27 UTC

    First things first - I get a not well-formed error when I try to run your code. The fix is to quote the value of PROGRAM_COUNT in the Participant tag (PROGRAM_COUNT="3"). (Actually, that's the second error I got. The first thing I did was add use strict; use warnings; and then declare all of the variables.)

    On to business. From what I can tell, you're getting confused about how to use references.

    For example, in the following line you are assigning an array reference to an array, which is probably not what you intended to do.

    my @program_array = $XMLref->{Response}->[0]->{Program};
    Either expand (dereference) the array ref to an array when you assign it, or simply assign the reference.
    my @program_array = @{ $XMLref->{Response}[0]{Program} }; my $program_aref = $XMLref->{Response}[0]{Program};

    Secondly, you've got a typo in $programs->{status};. This key is capitalized in the XML ('Status').

    Update: Thirdly, see Thelonius' reply to this node.

    Here is a corrected version of the code:

    if( ref( $XMLref->{Response}[0]{Program}[0] ) ) { my $program_arrayref = $XMLref->{Response}[0]{Program}; # Walk through the programs foreach my $programs_href ( @{ $program_arrayref } ) { my $status_aref = $programs_href->{Status}; my $one_program_name = $programs_href->{NAME}; # Walk through the statuses within each program foreach my $status_href ( @{ $status_aref } ) { my $hold_desc = $status_href->{StatusDesc}[0]; my $hold_start = $status_href->{StatusDate}[0]; my $hold_end = $status_href->{StatusEndDate}[0]; # a lot of other processing here print join( ':', $one_program_name, $hold_desc, $hold_start, $hold_end ), "\n"; } } }

    This prints:

    Balance:nl1 has been published:04/20/05: Balance:mas has been published:04/20/05:04/26/06 Breathe:nl1 has been published:04/20/05: Breathe:mas has been published:04/20/05:04/26/06

    The following links may help:

    HTH

    Update:Thanks to Thelonius for pointing out a correction to the code that I made but neglected to mention. I guess I was fixing faster than I was documenting. :-)

      In the interests of TIMTOWTDI, I wrote up an XPath version of your script:
      use XML::LibXML; my $xml = XML::LibXML->new->parse_file('example.xml'); for my $status ($xml->findnodes('//Status')) { print join( ':', $status->findvalue('parent::Program/@NAME'), $status->findvalue('StatusDesc'), $status->findvalue('StatusDate'), $status->findvalue('StatusEndDate') ), "\n"; }