|Perl: the Markov chain saw|
Re: Walking thru XMLby arturo (Vicar)
|on Nov 20, 2003 at 23:48 UTC||Need Help??|
The two classic XML-handling strategies are "Tree-based", such as XML::Simple and, in a more heavyweight and full-featured fashion, XML::DOM, and "Stream" or "event-based", such as SAX, which is sort of defined for Java primarily, although it's not surprising that XML::SAX exists for Perl. The tree-based strategy loads a whole XML document into memory, which allows for some neat tricks. The stream-based strategy deals with elements as they are encountered -- SAX turns various parts of an XML file into events (e.g. "here's a start element", "here are some characters", and so forth). Your question makes it sound as if what you want is a stream-based API, and you say you want to process the file "line-by-line," but your example suggests otherwise.
Your goal seems to be to take the individual <config> elements and turn them into hashes or objects. That's not a "line-by-line" strategy, that's "little trees" or, as one might call them, twigs ... (blatant plug for XML::Twig here). Your example suggests a half-way strategy: you want to grab each config element and its subelements and deal with that chunk, processing them one at a time. You could load up everything into one master tree, then "walk" through the tree selecting each config element in turn. If you have a lot of things to process,though, that could get expensive memory-wise. If it's not a problem then feel free to stick with XML::Simple.
Now, with respect to your actual goal here, XML::Simple can do a perfectly fine job, although I find it a little bit hard to use (probably because I haven't fully internalized how it turns elements and their attributes into data structures -- forgive me, grantm -- I know this behavior is configurable =). With a little study and care, you could certainly make better use of it than what follows as an example.
I do know enough to point out that you're using it incorrectly, though. $XMLConfig is a reference to a complex data structure, which (assuming you have some element wrapping a bunch of config elements similar to the one you have posted above), will be a reference to a hash that has a key called config, whose value is a reference to an array of other things, which are in turn quite complex themselves ... each of those "other things' ( the elements of the array reference) correponds to a config element and its contents in your file. So the basic outer processing loop would look like this:
Finishing that up is left as an exercise for the reader =) If you want to get a better handle on what the data structure looks like at any point, use Data::Dumper to print out the structure for you.
As an aside, I know your code is skeletal, but you can't capture the output of system commands; you could use backticks or qx//, but let me suggest that you pipe mysqldump's output to a file and then deal appropriately with the file).
Finally, let me give you a start on how you might use XML::Twig for this job. The basic framework might look like this:
YMMV, of course, but I find the twiggish way of doing it easier to understand. HTH!
If not P, what? Q maybe?