RegEx Against Arbitrary XML Tags

by GrandFather (Sage)
on Oct 19, 2011 at 21:50 UTC

in reply to RegEx Against Arbitrary XML Tags

Almost certainly you don't want to parse XML using hand rolled code. Instead use one of the many XML parsing modules (XML::Twig is highly recommended). Robustly parsing XML is hard and you will spend much more time trying to get it right than you will learning to use a module to do the heavy lifting for you. Consider:

use warnings; use strict; use XML::Twig; my $xml = <<XML; <ROOT hostname="bumblebee" tstamp="2011/09/21 22:24:05"> <APPLICATION> <PORT>7777</PORT> <APP_HOME>/extra/localcw/opt/APP/sun4</APP_HOME> <VERSION>V36.11.01</VERSION> <PERF_HOME>/usr/localcw/opt/APP/Solaris-2-9-sparc-64</ +PERF_HOME> <PERF_VERSION>glanceSunOS 5.9 (Solaris 9) (sparc, 64 B +it) Jul 19 2006</PERF_VERSION> <STAR_VERSION>3.0</STAR_VERSION> <DEFAULT_ACCT>root</DEFAULT_ACCT> <HISTORY_RETENTION>90</HISTORY_RETENTION> <LAST_FILE_DOWN>StAR-201105090928.tar</LAST_FILE_DOWN> <LAST_STATUS>No download file found</LAST_STATUS> <ACL> <ACCOUNT id="f9a64ef61c"> <MD5>f9a64ef61c</MD5> <USERNAME>*</USERNAME> <HOST>flower</HOST> <PERMISSION>P</PERMISSION> </ACCOUNT> </ACL> </APPLICATION> </ROOT> XML my $twig = XML::Twig->new( twig_roots => {'APPLICATION' => \&doStuff, 'ACL' => \&doStuff} ); $twig->parse($xml); sub doStuff { my ($t, $elt) = @_; print "Found ", $elt->tag(), "\n"; $t->purge; # frees the memory }


