http://www.perlmonks.org?node_id=1023199

jmensel has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I recently started dealing with a bunch of deeply nested XML documents that I'm having to parse and generate reports from. The code works and does the job, but I think that I'm going about it the wrong way - the code I've created is an ugly, inflexible mess of for loops and crazy long references. My guess is that there's a better way. I would deeply appreciate any pointers in a better direction.

Here are the details:

I've used XML::Simple to convert the XML into data structures like so:

my $data = $xml->XMLin("$fail", ForceArray => 1);

I'm then recursing through the data structure like so. It's super ugly:

my $numclients = @{$data->{items}[0]->{client}}; for (my $i=0; $i<$numclients; $i++) { my $clientname = $data->{items}[0]->{client}[$i]->{name}[0]; my $numsites = @{$data->{items}[0]->{client}[$i]->{site}}; for (my $s=0; $s<$numsites; $s++) { my $sitename = $data->{items}[0]->{client}[$i]->{site}[$s]->{name} +[0]; my $numservs = @{$data->{items}[0]->{client}[$i]->{site}[$s]->{ser +vers}}; my %servers; my @fails; my $report; for (my $sv=0; $sv<$numservs; $sv++) { if ($data->{items}[0]->{client}[$i]->{site}[$s]->{servers} +[$sv]->{server}[0]->{name}[0]) { my $server = $data->{items}[0]->{client}[$i]->{site}[$ +s]->{servers}[$sv]->{server}[0]->{name}[0]; eval { my $numfails = @{$data->{items}[0]->{client}[$i]-> +{site}[$s]->{servers}[$sv]->{server}[0]->{failed_checks}}; for (my $fc=0; $fc<$numfails; $fc++) { my $numchecks = @{$data->{items}[0]->{client}[ +$i]->{site}[$s]->{servers}[$sv]->{server}[0]->{failed_checks}[$fc]->{ +check}}; for (my $check=0; $check<$numchecks; $check++) + { if ($data->{items}[0]->{client}[$i]->{site +}[$s]->{servers}[$sv]->{server}[0]->{failed_checks}[$fc]->{check}[$ch +eck]->{dsc_247}[0] == 1) { my $description = $data->{items}[0]->{cl +ient}[$i]->{site}[$s]->{servers}[$sv]->{server}[0]->{failed_checks}[$ +fc]->{check}[$check]->{description}[0]; if ($description !~ /Performance Monit +oring/ ) { push (@fails, $description); $report = 1; } } } } }; # End eval eval { if ( @{$data->{items}[0]->{client}[$i]->{site}[$s] +->{servers}[$sv]->{server}[0]->{overdue}[0]->{description}[0]} ) { my $overdue_ref = @{$data->{items}[0]->{client}[$i +]->{site}[$s]->{servers}[$sv]->{server}[0]->{overdue}[0]->{descriptio +n}[0]}; push (@fails, $overdue_ref); $report=1; } }; # end eval

Replies are listed 'Best First'.
Re: A sane method for parsing deeply nested XML
by choroba (Cardinal) on Mar 13, 2013 at 14:09 UTC
    Use intermediate variables:
    for (my $sv=0; $sv<$numservs; $sv++) { my $server = $data->{items}[0]->{client}[$i]->{site}[$s]-> +{servers}[$sv]->{server}[0]; if ($server->{name}[0]) { my $server_name = $server->{name}[0]; eval { my $numfails = @{$server->{failed_checks}}; for (my $fc=0; $fc<$numfails; $fc++) { my $numchecks = @{$server->{failed_checks}[$fc +]->{check}}; for (my $check=0; $check<$numchecks; $check++) + { if ($server->{failed_checks}[$fc]->{check} +[$check]->{dsc_247}[0] == 1) { my $description = $server->{failed_check +s}[$fc]->{check}[$check]->{description}[0]; if ($description !~ /Performance Monit +oring/ ) { push (@fails, $description); $report = 1; } } } }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      *Smacks forehead* Duh. Thank you.

Re: A sane method for parsing deeply nested XML
by sundialsvc4 (Abbot) on Mar 13, 2013 at 14:40 UTC

    Another, more-general solution is to use XPath expressions, as implemented by some of the packages such as XML::LibXML.   (This particular package invokes an industry-standard C library for handling XML.)

    The advantage of this approach is that the expression describes what you want to retrieve, and it’s up to the library to give you the result.   The structure of your application logic no longer has to match the structure of the XML ... sort of like the difference between a hand-crafted recursive descent compiler vs. using a grammar system like YACC.   If your XML is complicated, deeply-nested, variable and so forth, then that’s what is “sane” to me.

Re: A sane method for parsing deeply nested XML
by aitap (Curate) on Mar 13, 2013 at 14:50 UTC

    I remember trying to parse some not-well-documented xml.zip format myself, writing similar code using XML::Simple. Unfortunately, I didn't have a chance to rewrite it using XML::Twig. Fortunately, now it's not used anywhere.

    What about writing something like this?

    use XML::Twig; my @fails; my $parser = XML::Twig::->new(twig_roots => { '/items/client/site/servers/server/failed_checks/check/description' = +> sub { push @fails, $_->text }, '/items/client/site/servers/server/overdue/description' => sub { push + @fails, $_->text }, })->parse_file($fail); my $report = @fails ? 1 : 0;
    The code is fully untested, but I could try fixing it if you provide a bit of example input data.

    Another option could be using Data::Diver.

    Sorry if my advice was wrong.
Re: A sane method for parsing deeply nested XML
by runrig (Abbot) on Mar 13, 2013 at 15:27 UTC
    I don't care for the deeply nested for loops, so I'd go with some callback based library, like XML::Rules. But since I have no example data, and no time, I give you no code. Sorry.

      Many thanks to you all...this was most helpful.