Beefy Boxes and Bandwidth Generously Provided by pair Networks
Your skill will accomplish
what the force of many cannot

Re: RegEx Against Arbitrary XML Tags

by GrandFather (Sage)
on Oct 19, 2011 at 21:50 UTC ( #932515=note: print w/replies, xml ) Need Help??

in reply to RegEx Against Arbitrary XML Tags

Almost certainly you don't want to parse XML using hand rolled code. Instead use one of the many XML parsing modules (XML::Twig is highly recommended). Robustly parsing XML is hard and you will spend much more time trying to get it right than you will learning to use a module to do the heavy lifting for you. Consider:

use warnings; use strict; use XML::Twig; my $xml = <<XML; <ROOT hostname="bumblebee" tstamp="2011/09/21 22:24:05"> <APPLICATION> <PORT>7777</PORT> <APP_HOME>/extra/localcw/opt/APP/sun4</APP_HOME> <VERSION>V36.11.01</VERSION> <PERF_HOME>/usr/localcw/opt/APP/Solaris-2-9-sparc-64</ +PERF_HOME> <PERF_VERSION>glanceSunOS 5.9 (Solaris 9) (sparc, 64 B +it) Jul 19 2006</PERF_VERSION> <STAR_VERSION>3.0</STAR_VERSION> <DEFAULT_ACCT>root</DEFAULT_ACCT> <HISTORY_RETENTION>90</HISTORY_RETENTION> <LAST_FILE_DOWN>StAR-201105090928.tar</LAST_FILE_DOWN> <LAST_STATUS>No download file found</LAST_STATUS> <ACL> <ACCOUNT id="f9a64ef61c"> <MD5>f9a64ef61c</MD5> <USERNAME>*</USERNAME> <HOST>flower</HOST> <PERMISSION>P</PERMISSION> </ACCOUNT> </ACL> </APPLICATION> </ROOT> XML my $twig = XML::Twig->new( twig_roots => {'APPLICATION' => \&doStuff, 'ACL' => \&doStuff} ); $twig->parse($xml); sub doStuff { my ($t, $elt) = @_; print "Found ", $elt->tag(), "\n"; $t->purge; # frees the memory }


True laziness is hard work

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://932515]
[james28909]: ok maybe i need to ask this question in sopw and not CB but here goes
[james28909]: i am manually parsing html, and am trying to wrap my head around keeping up with tag counts
[james28909]: i can extract a certain tree manually by regexing the html file for a starting anchor, then i send the position of the match to a sub and then seek to that position in the file and keep up with div tags (for now)
[james28909]: how can i manage multiple tags? other than div?

How do I use this? | Other CB clients
Other Users?
Others studying the Monastery: (9)
As of 2017-04-25 03:55 GMT
Find Nodes?
    Voting Booth?
    I'm a fool:

    Results (448 votes). Check out past polls.