Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris

Using the XML::Parser Module

by raina (Initiate)
on Nov 02, 2005 at 10:20 UTC ( #504886=perlquestion: print w/replies, xml ) Need Help??
raina has asked for the wisdom of the Perl Monks concerning the following question:

I need some good documentation/sample code or examples to read and modify specific values within tags. I used to use regexs for this earlier but I wanna switch to using the xml parser module now. Something that explains the differences between the two wud be great.

Replies are listed 'Best First'.
Re: Using the XML::Parser Module
by marto (Bishop) on Nov 02, 2005 at 10:32 UTC
    Hi raina,

    Perhaps you would be better looking at the XML::Twig module.
    The XML::Twig website has good documentation and plenty of examples.

    Hope this helps.

Re: Using the XML::Parser Module
by GrandFather (Sage) on Nov 02, 2005 at 10:32 UTC

    Rather than using the raw parser you may get more milage out of XML::Twig. The documentation with the module is a pretty good starting point. You could also try Super Searching here where you will find that Twig is mentioned often.

    If you put some code together and run into trouble there are many people here who would help you out.

    Perl is Huffman encoded by design.
Re: Using the XML::Parser Module
by Aristotle (Chancellor) on Nov 02, 2005 at 14:03 UTC

    I’d advise against XML::Parser at this point for two reasons – it’s a wrapper around the rather old (if trusty) expat library, and its API is rather hard to program for – because back when expat was written, XML was still in a bit of a flux.

    For processing XML documents, you want to learn about XPath. A pithy description of what it is might be “a pattern match language for trees.” I lets you specify which portion of a document you’re interested in very concisely. Knowing XPath is the difference between XML being a chore or a charm.

    XML::Twig does make things much easier, but when I last dealt with it it did not offer real XPath support and worked pretty heavily on the Perl side of things. That means large documents are slow to process and can consumed a lot of memory. The memory hunger can be controlled if you pay careful attention and your use case lends itself to processing the document chunk-wise, but that takes effort.

    I’d instead suggest XML::LibXML. It’s a wrapper around the newer, more compliant libxml2 library which offers the nicer sorts of APIs that were designed after XML was finished – its XPath support is excellent. And since its internal data structures all reside on the C side, it can handle much larger documents than the (more) pure-Perl modules without any effort on the programmer’s part. It’s also much faster than such modules for the same reason.

    I use it for all of my XML needs these days am an absolutely satisfied customer.

    Makeshifts last the longest.

      I would agree with you that XML::LibXML is also a good choice. In my (oddly enough limited ;--) experience, it feels a little "lower-level" than XML::Twig, mostly because it forces you to use the DOM to process the data, while XML::Twig has (lots of!) higher-level methods. I agree that it implements very well quite a few standards, and it probably lends to more rigourous code than XML::Twig.

      One word to correct you on one point: XML::Twig did not offer real XPath support: it does now, if you use XML::Twig::XPath, which simply re-uses XML::XPath engine.

        I always think of you and cringe a little when I recommend against your module. You did a really admirable job of building something actually usable onto XML::Parser, and for a long time XML::Twig was my indeed favourite. It’s just that expat and XML::Parser really needed that work to be turned into something sane, whereas XML::LibXML is sane to begin with. Sorry. :-) :-(

        Re: XPath support: thanks for the pointer; noted.

        Makeshifts last the longest.

Re: Using the XML::Parser Module
by leriksen (Curate) on Nov 02, 2005 at 22:46 UTC
    Another option is XML::GDOME, which, apart from a slightly outdated install process, seems to work well. And has XPath support out of the box. Again another case of the thin perl wrapper around the speedy C implementation. Doco is complete, but it doesnt hold anyones hand - if you know XML reasonably well, you should be fine - if your not familiar with the XML concepts, finding what you need can be difficult.

    #!/usr/bin/perl -w use strict; use XML::GDOME; my $fname = '/path/to/your.xml'; my $doc = XML::GDOME->createDocFromURI($fname,GDOME_LOAD_SUBSTITUTE_EN +TITIES); # or whatever gdome options float your boat my @nodes = $doc->findnodes('/xpath/to/required/element'); foreach my $node (@nodes) { if (needToUpdateAttribute($node)) { my $attributeName = getRequiredAttrName(...); my $newValue = getNewAttrValue(...); $node->setAttribute($attributeName, $newValue); } }

    ...reality must take precedence over public relations, for nature cannot be fooled. - R P Feynmann

      XML::GDOME is actually a wrapper around libgdome which is a wrapper around libxml2, so it’s no surprise that it supports the same things. libgdome adds full DOM Level 2 support on top of libxml2 (including the Events stuff and such); unless you need that (which few people probably do), you can just use XML::LibXML. The code you’ll write is identical in both cases.

      Makeshifts last the longest.

        Thanks guys ... the help is really appreciated :-)

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://504886]
Approved by GrandFather
and the rats come out to play...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2017-05-27 09:26 GMT
Find Nodes?
    Voting Booth?