Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer

Walking thru XML

by Anonymous Monk
on Nov 20, 2003 at 22:02 UTC ( #308725=perlquestion: print w/ replies, xml ) Need Help??
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to walk through an XML file? I want to parse an XML file line-by-line, taking action depending on the tag found. I'm making a simple backup script that will back up filesystems as well as {My | Postgre}SQL databases. The config file would look like this:
<config> <logprefix>/var/log/sbackup</logprefix> <storage>/home/backups</storage> <item name = "SomeDatabase" type="MySQLDB"> <dbName>SomeDatabase</dbName> <dbUser>SomeUser</dbuser> <dbPass>password</dbPass> </item> <item name = "SomeFilesFiles" type="Filesystem"> <path>/home/globalherald</path> </item> </config>
I tried using XML::Simple, dumping the contents into a hash structure, and walking through that, but I've done something wrong. Is what I'm trying to do possible? Here's the script:
#!/usr/bin/perl # Backing up the server. Looks for a file # called sbackup.xml in /etc which specifies, which databases # and which directories to back up. # # TODO: # Add tar/database dump logs to logs # Add postgresql dumper # Add tar/database dump options to XML use strict; use warnings; use XML::Simple; use IO::File; use vars qw($XMLConfig $logprefix $fh); sub LogMsg { my $consecho = 1; my $deadly = shift; my $message = shift; my $timestamp = localtime(time); open LogFile, ">>$logprefix/sbackup.log" || die "Can't open logfil +e!"; print LogFile "$timestamp: $message\n"; if ($consecho == 1) { print "$timestamp: $message\n"; } return; } sub ParseConfig { $fh = new IO::File('/etc/sbackup.xml') or LogMsg(1, "Can't open sb +ackup.xml!"); $XMLConfig = XMLin($fh); return; } sub BackupMysqlDatabase { my $item = shift; my $dbname = shift; my $dbuser = shift; my $dbpass = shift; my $command = "mysqldump --quick --add-locks --add-drop-table -a - +e -F -K -u $dbuser -p $dbpass $dbname"; #my $result = system($command); print $command; my $result = 0; if ($result == 0) { LogMsg(0, "Backup of $item, database $dbname successful!"); } else { LogMsg(0, "Backup of $item, database $item failed!"); } return; } sub BackupFS { my $item = shift; my $frompath = shift; my $storage = shift; my $command = "tar cf $storage/$item.tar $frompath"; #my $result = system($command); print $command; my $result = 0; if ($result == 0) { LogMsg(0, "Backup of $item, pathname $frompath successful!"); } else { LogMsg(0, "Backup of $item, pathname $frompath failed!"); } return; } my $localstorage; my %currentFS = ("item", "", "source", "") ; my %currentDB = ("item", "", "dbname", "", "dbuser", "", "dbpass", "" +); my $currentitem; ParseConfig(); foreach my $element ($XMLConfig) { print "The element is $element!\n"; my %somehash = {$element}; my @hashkeys = keys %somehash; print "Keys are: @hashkeys\n"; if ($element eq "Logprefix") { $logprefix = $XMLConfig->logprefix; } if ($element eq "Storage") { $localstorage = $XMLConfig->storage; } if ($element eq "Item") { if ($element->{type} eq "MySQLDB") { foreach my $element2 ($element) { if ($element2 eq "DbName") { $currentDB{item} = $element->{item}; $currentDB{dbname} = $element2->{DbName}; } if ($element2 eq "DbUser") { $currentDB{dbuser} = $element2->{DbUser}; } if ($element2 eq "DbPass") { $currentDB{dbpass} = $element2->{DbPass}; } } BackupMySQLDatabase($currentDB{item}, $currentDB{dbname}, +$currentDB{dbuser}, $currentDB{dbpass}); } if ($element->{type} eq "Filesystem") { foreach my $element2 ($element) { if ($element2 eq "Path") { $currentFS{item} = $element->{Item}; $currentFS{path} = $element2->{Path}; } } BackupFS($currentFS{item}, $currentFS{path}, $localstorage +); } } } close ($fh);
Thanks everybody!

Replies are listed 'Best First'.
Re: Walking thru XML
by arturo (Vicar) on Nov 20, 2003 at 23:48 UTC

    The two classic XML-handling strategies are "Tree-based", such as XML::Simple and, in a more heavyweight and full-featured fashion, XML::DOM, and "Stream" or "event-based", such as SAX, which is sort of defined for Java primarily, although it's not surprising that XML::SAX exists for Perl. The tree-based strategy loads a whole XML document into memory, which allows for some neat tricks. The stream-based strategy deals with elements as they are encountered -- SAX turns various parts of an XML file into events (e.g. "here's a start element", "here are some characters", and so forth). Your question makes it sound as if what you want is a stream-based API, and you say you want to process the file "line-by-line," but your example suggests otherwise.

    Your goal seems to be to take the individual &lt;config&gt; elements and turn them into hashes or objects. That's not a "line-by-line" strategy, that's "little trees" or, as one might call them, twigs ... (blatant plug for XML::Twig here). Your example suggests a half-way strategy: you want to grab each config element and its subelements and deal with that chunk, processing them one at a time. You could load up everything into one master tree, then "walk" through the tree selecting each config element in turn. If you have a lot of things to process,though, that could get expensive memory-wise. If it's not a problem then feel free to stick with XML::Simple.

    Now, with respect to your actual goal here, XML::Simple can do a perfectly fine job, although I find it a little bit hard to use (probably because I haven't fully internalized how it turns elements and their attributes into data structures -- forgive me, grantm -- I know this behavior is configurable =). With a little study and care, you could certainly make better use of it than what follows as an example.

    I do know enough to point out that you're using it incorrectly, though. $XMLConfig is a reference to a complex data structure, which (assuming you have some element wrapping a bunch of config elements similar to the one you have posted above), will be a reference to a hash that has a key called config, whose value is a reference to an array of other things, which are in turn quite complex themselves ... each of those "other things' ( the elements of the array reference) correponds to a config element and its contents in your file. So the basic outer processing loop would look like this:

    foreach my $config ( @{ $XMLConfig->{config} } ) { my $logprefix = $config->{logprefix}; #etc ... }
    Finishing that up is left as an exercise for the reader =) If you want to get a better handle on what the data structure looks like at any point, use Data::Dumper to print out the structure for you.

    As an aside, I know your code is skeletal, but you can't capture the output of system commands; you could use backticks or qx//, but let me suggest that you pipe mysqldump's output to a file and then deal appropriately with the file).

    Finally, let me give you a start on how you might use XML::Twig for this job. The basic framework might look like this:

    #!/usr/bin/perl use strict; use XML::Twig; # create a new Twig object that will call the "config" # subroutine once it's seen a complete "config" element my $twig = XML::Twig->new( twig_handlers => { 'config' => \&config }); $twig->parsefile("configs.xml"); sub config { my ($t, $config ) = @_; # $config is a config element my $logprefix = $config->child("logprefix")->text; my @items = $config->children("item"); foreach my $item ( @items ) { my $name = $item->att('name'); my $type = $item->att('type'); # and so forth } }

    YMMV, of course, but I find the twiggish way of doing it easier to understand. HTH!

    If not P, what? Q maybe?
    "Sidney Morgenbesser"

Re: Walking thru XML
by princepawn (Parson) on Nov 20, 2003 at 22:21 UTC

    I used XML::TreeBuilder's traverse method to do that.

    Actually to be honest, I used HTML::TreeBuilder, but the API for XML and HTML treebuilder are very much the same. And I would certain reach for XML::TreeBuilder if I had an XML task to do. PApp::SQL and CGI::Application rock the house

Re: Walking thru XML
by mirod (Canon) on Nov 21, 2003 at 11:37 UTC

    A few comments:

    • XML does not have the concept of lines, it has the concept of element, so try not to use a line metaphor when you think of XML data, a tree works better.
    • then I will have to concur with arturo (and not only with his usage of XML::Twig ;--) : if you want to use XML::Simple, load the config data and dump it using Data::Dumper or YAML, it will make it much easier for you to figure out what's going on (alternatively you can use the debugger by running perl -d and use x $XMLConfig),
    • finally, is there any specific reason why you want to use XML here? I find YAML to be easier to read and to edit than XML. After fixing the typo in your XML, perl -MYAML -MXML::Simple -e 'print Dump XMLin "config.xml"' will give you this:
      --- #YAML:1.0 item: SomeDatabase: dbName: SomeDatabase dbPass: password dbuser: SomeUser type: MySQLDB SomeFilesFiles: path: '/home/globalherald' type: Filesystem logprefix: '/var/log/sbackup' storage: '/home/backups'
Re: Walking thru XML
by Jaap (Curate) on Nov 20, 2003 at 22:22 UTC

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://308725]
Approved by bart
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others cooling their heels in the Monastery: (13)
As of 2016-06-28 21:33 GMT
Find Nodes?
    Voting Booth?
    My preferred method of making French fries (chips) is in a ...

    Results (363 votes). Check out past polls.