Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical

A sane method for parsing deeply nested XML

by jmensel (Initiate)
on Mar 13, 2013 at 13:56 UTC ( #1023199=perlquestion: print w/replies, xml ) Need Help??
jmensel has asked for the wisdom of the Perl Monks concerning the following question:

Dear Monks,

I recently started dealing with a bunch of deeply nested XML documents that I'm having to parse and generate reports from. The code works and does the job, but I think that I'm going about it the wrong way - the code I've created is an ugly, inflexible mess of for loops and crazy long references. My guess is that there's a better way. I would deeply appreciate any pointers in a better direction.

Here are the details:

I've used XML::Simple to convert the XML into data structures like so:

my $data = $xml->XMLin("$fail", ForceArray => 1);

I'm then recursing through the data structure like so. It's super ugly:

my $numclients = @{$data->{items}[0]->{client}}; for (my $i=0; $i<$numclients; $i++) { my $clientname = $data->{items}[0]->{client}[$i]->{name}[0]; my $numsites = @{$data->{items}[0]->{client}[$i]->{site}}; for (my $s=0; $s<$numsites; $s++) { my $sitename = $data->{items}[0]->{client}[$i]->{site}[$s]->{name} +[0]; my $numservs = @{$data->{items}[0]->{client}[$i]->{site}[$s]->{ser +vers}}; my %servers; my @fails; my $report; for (my $sv=0; $sv<$numservs; $sv++) { if ($data->{items}[0]->{client}[$i]->{site}[$s]->{servers} +[$sv]->{server}[0]->{name}[0]) { my $server = $data->{items}[0]->{client}[$i]->{site}[$ +s]->{servers}[$sv]->{server}[0]->{name}[0]; eval { my $numfails = @{$data->{items}[0]->{client}[$i]-> +{site}[$s]->{servers}[$sv]->{server}[0]->{failed_checks}}; for (my $fc=0; $fc<$numfails; $fc++) { my $numchecks = @{$data->{items}[0]->{client}[ +$i]->{site}[$s]->{servers}[$sv]->{server}[0]->{failed_checks}[$fc]->{ +check}}; for (my $check=0; $check<$numchecks; $check++) + { if ($data->{items}[0]->{client}[$i]->{site +}[$s]->{servers}[$sv]->{server}[0]->{failed_checks}[$fc]->{check}[$ch +eck]->{dsc_247}[0] == 1) { my $description = $data->{items}[0]->{cl +ient}[$i]->{site}[$s]->{servers}[$sv]->{server}[0]->{failed_checks}[$ +fc]->{check}[$check]->{description}[0]; if ($description !~ /Performance Monit +oring/ ) { push (@fails, $description); $report = 1; } } } } }; # End eval eval { if ( @{$data->{items}[0]->{client}[$i]->{site}[$s] +->{servers}[$sv]->{server}[0]->{overdue}[0]->{description}[0]} ) { my $overdue_ref = @{$data->{items}[0]->{client}[$i +]->{site}[$s]->{servers}[$sv]->{server}[0]->{overdue}[0]->{descriptio +n}[0]}; push (@fails, $overdue_ref); $report=1; } }; # end eval

Replies are listed 'Best First'.
Re: A sane method for parsing deeply nested XML
by choroba (Chancellor) on Mar 13, 2013 at 14:09 UTC
    Use intermediate variables:
    for (my $sv=0; $sv<$numservs; $sv++) { my $server = $data->{items}[0]->{client}[$i]->{site}[$s]-> +{servers}[$sv]->{server}[0]; if ($server->{name}[0]) { my $server_name = $server->{name}[0]; eval { my $numfails = @{$server->{failed_checks}}; for (my $fc=0; $fc<$numfails; $fc++) { my $numchecks = @{$server->{failed_checks}[$fc +]->{check}}; for (my $check=0; $check<$numchecks; $check++) + { if ($server->{failed_checks}[$fc]->{check} +[$check]->{dsc_247}[0] == 1) { my $description = $server->{failed_check +s}[$fc]->{check}[$check]->{description}[0]; if ($description !~ /Performance Monit +oring/ ) { push (@fails, $description); $report = 1; } } } }
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      *Smacks forehead* Duh. Thank you.

Re: A sane method for parsing deeply nested XML
by sundialsvc4 (Abbot) on Mar 13, 2013 at 14:40 UTC

    Another, more-general solution is to use XPath expressions, as implemented by some of the packages such as XML::LibXML.   (This particular package invokes an industry-standard C library for handling XML.)

    The advantage of this approach is that the expression describes what you want to retrieve, and it’s up to the library to give you the result.   The structure of your application logic no longer has to match the structure of the XML ... sort of like the difference between a hand-crafted recursive descent compiler vs. using a grammar system like YACC.   If your XML is complicated, deeply-nested, variable and so forth, then that’s what is “sane” to me.

Re: A sane method for parsing deeply nested XML
by aitap (Deacon) on Mar 13, 2013 at 14:50 UTC

    I remember trying to parse some not-well-documented format myself, writing similar code using XML::Simple. Unfortunately, I didn't have a chance to rewrite it using XML::Twig. Fortunately, now it's not used anywhere.

    What about writing something like this?

    use XML::Twig; my @fails; my $parser = XML::Twig::->new(twig_roots => { '/items/client/site/servers/server/failed_checks/check/description' = +> sub { push @fails, $_->text }, '/items/client/site/servers/server/overdue/description' => sub { push + @fails, $_->text }, })->parse_file($fail); my $report = @fails ? 1 : 0;
    The code is fully untested, but I could try fixing it if you provide a bit of example input data.

    Another option could be using Data::Diver.

    Sorry if my advice was wrong.
Re: A sane method for parsing deeply nested XML
by runrig (Abbot) on Mar 13, 2013 at 15:27 UTC
    I don't care for the deeply nested for loops, so I'd go with some callback based library, like XML::Rules. But since I have no example data, and no time, I give you no code. Sorry.

      Many thanks to you all...this was most helpful.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1023199]
Approved by marto
[Eily]: that was a sneeze overpowering a cough
[choroba]: Czechs sneeze "Hep-cheek"
vrk adds gingerbread cookies to the platter on the sideboard.
[erix]: I'll give that a try next time
vrk takes a cookie from the platter on the sideboard.
[Eily]: French sneeze "Atchoum", because we close our mouth when we're done :P
[erix]: Hatshepsut (no sneezing audio there)
choroba wipes the saliva from the cookies
[LanX]: scary Le Pen + Melonchon had over 40% ...
[vrk]: One good word for it in Finnish is pärskäys. A very wet connotation.

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (11)
As of 2017-04-24 15:48 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (442 votes). Check out past polls.