Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re: RegEx Against Arbitrary XML Tags

by GrandFather (Cardinal)
on Oct 19, 2011 at 21:50 UTC ( #932515=note: print w/ replies, xml ) Need Help??


in reply to RegEx Against Arbitrary XML Tags

Almost certainly you don't want to parse XML using hand rolled code. Instead use one of the many XML parsing modules (XML::Twig is highly recommended). Robustly parsing XML is hard and you will spend much more time trying to get it right than you will learning to use a module to do the heavy lifting for you. Consider:

use warnings; use strict; use XML::Twig; my $xml = <<XML; <ROOT hostname="bumblebee" tstamp="2011/09/21 22:24:05"> <APPLICATION> <PORT>7777</PORT> <APP_HOME>/extra/localcw/opt/APP/sun4</APP_HOME> <VERSION>V36.11.01</VERSION> <PERF_HOME>/usr/localcw/opt/APP/Solaris-2-9-sparc-64</ +PERF_HOME> <PERF_VERSION>glanceSunOS 5.9 (Solaris 9) (sparc, 64 B +it) 7.3.00.6059 Jul 19 2006</PERF_VERSION> <STAR_VERSION>3.0</STAR_VERSION> <DEFAULT_ACCT>root</DEFAULT_ACCT> <HISTORY_RETENTION>90</HISTORY_RETENTION> <LAST_FILE_DOWN>StAR-201105090928.tar</LAST_FILE_DOWN> <LAST_STATUS>No download file found</LAST_STATUS> <ACL> <ACCOUNT id="f9a64ef61c"> <MD5>f9a64ef61c</MD5> <USERNAME>*</USERNAME> <HOST>flower</HOST> <PERMISSION>P</PERMISSION> </ACCOUNT> </ACL> </APPLICATION> </ROOT> XML my $twig = XML::Twig->new( twig_roots => {'APPLICATION' => \&doStuff, 'ACL' => \&doStuff} ); $twig->parse($xml); sub doStuff { my ($t, $elt) = @_; print "Found ", $elt->tag(), "\n"; $t->purge; # frees the memory }

Prints:

Found ACL Found APPLICATION
True laziness is hard work


Comment on Re: RegEx Against Arbitrary XML Tags
Select or Download Code

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://932515]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others taking refuge in the Monastery: (2)
As of 2014-07-12 23:41 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (242 votes), past polls