Re: Walking thru XML

in reply to Walking thru XML

The two classic XML-handling strategies are "Tree-based", such as XML::Simple and, in a more heavyweight and full-featured fashion, XML::DOM, and "Stream" or "event-based", such as SAX, which is sort of defined for Java primarily, although it's not surprising that XML::SAX exists for Perl. The tree-based strategy loads a whole XML document into memory, which allows for some neat tricks. The stream-based strategy deals with elements as they are encountered -- SAX turns various parts of an XML file into events (e.g. "here's a start element", "here are some characters", and so forth). Your question makes it sound as if what you want is a stream-based API, and you say you want to process the file "line-by-line," but your example suggests otherwise.

Your goal seems to be to take the individual <config> elements and turn them into hashes or objects. That's not a "line-by-line" strategy, that's "little trees" or, as one might call them, twigs ... (blatant plug for XML::Twig here). Your example suggests a half-way strategy: you want to grab each config element and its subelements and deal with that chunk, processing them one at a time. You could load up everything into one master tree, then "walk" through the tree selecting each config element in turn. If you have a lot of things to process,though, that could get expensive memory-wise. If it's not a problem then feel free to stick with XML::Simple.

Now, with respect to your actual goal here, XML::Simple can do a perfectly fine job, although I find it a little bit hard to use (probably because I haven't fully internalized how it turns elements and their attributes into data structures -- forgive me, grantm -- I know this behavior is configurable =). With a little study and care, you could certainly make better use of it than what follows as an example.

I do know enough to point out that you're using it incorrectly, though. $XMLConfig is a reference to a complex data structure, which (assuming you have some element wrapping a bunch of config elements similar to the one you have posted above), will be a reference to a hash that has a key called config, whose value is a reference to an array of other things, which are in turn quite complex themselves ... each of those "other things' ( the elements of the array reference) correponds to a config element and its contents in your file. So the basic outer processing loop would look like this:

foreach my $config ( @{ $XMLConfig->{config} } ) { 

   my $logprefix = $config->{logprefix};
   #etc ...
}
[download]

Finishing that up is left as an exercise for the reader =) If you want to get a better handle on what the data structure looks like at any point, use Data::Dumper to print out the structure for you.

As an aside, I know your code is skeletal, but you can't capture the output of system commands; you could use backticks or qx//, but let me suggest that you pipe mysqldump's output to a file and then deal appropriately with the file).

Finally, let me give you a start on how you might use XML::Twig for this job. The basic framework might look like this:

#!/usr/bin/perl

use strict;
use XML::Twig;

# create a new Twig object that will call the "config"
# subroutine once it's seen a complete "config" element

my $twig = XML::Twig->new(
             twig_handlers => {
                 'config' => \&config
                      });

$twig->parsefile("configs.xml");

sub config {
   my ($t, $config ) = @_; # $config is a config element 
   my $logprefix = $config->child("logprefix")->text;
   my @items = $config->children("item");
   foreach my $item ( @items ) {
       my $name = $item->att('name');
       my $type = $item->att('type');
       # and so forth
    }
}
[download]

YMMV, of course, but I find the twiggish way of doing it easier to understand. HTH!

If not P, what? Q maybe?
"Sidney Morgenbesser"

Comment on Re: Walking thru XML Select or Download Code

In Section Seekers of Perl Wisdom