Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:


I'm looking for XML module recommendations. A brief check on CPAN shows there's no shortage but I'm having trouble deciding which one to use. XML isn't really my main interest and I don't particularly want to learn all of them. I'd like to be able to learn one module interface and be able to apply it in other languages as well. Simplicity doesn't matter too much as long as the knowledge is fairly language/module independent.

Based on the little I know of XML, I think a DOM module would be appropriate. I want to be able to access any part of the document at any given time. Resource use doesn't matter much and I won't be dealing with very large files (nothing over 10 megs).

Any suggestions? Sorry for the rather vague criteria, I'm not all that knowledgeable on the subject. Overviews of the different modules would be useful as well.

Replies are listed 'Best First'.
Re: XML Module Recommendations
by mirod (Canon) on Jan 26, 2003 at 05:19 UTC

    In terms of general resourcesyou can have a look at the Perl and XML FAQ and Kip Hampton's column.The Module Reviewson this site also include quite a few nodes about XML modules.

    More specifically, I don't think there is any module that will satisfy all your criteria but lets list the main candidates amongst the tree based modules:

    • XML::Parser: available everywhere (comes standard with Activestate Perl, as it is used by PPM, but you need to install expat separately on *nix)), low-level (usually used to more build convenient modules), has a Tree Style that gives access to the whole document at once but no one seems to like it (or even use it),
    • XML::Simple: available through PPM,based on XML::Parser, can be used only for data-oriented XML (no mixed-content), loads the XML into a Perl structure,
    • XML::Twig: based on XML::Parser, no PPM available, see the FAQ for instructions about installing it on Windows, mixed event-tree mode, I like it (but I also wrote it ;--),
    • XML::DOM: based on XML::Parser, my only take on it is that the DOM is NOT appropriate for general purpose XML transformation, it gives you plenty of rope... avoid it,
    • XML::LibXML: based on libxml2, which needs to be installed, but a really nice module,which gives you SAX, DOM and XPath (the addition of XPath makes the DOM usable).

    Those are the main tree-based modules, all of the SAX modules work are event-based. BTW XML::SAX::PurePerl would probably be too slow for a 10M file so it is likely that you might no be able to use a pure Perl solution.

    In the end I would think that XML::Twig (surprise ;--) or XML::LibXML are the best choices, unless you can use XML::Simple. It also depends on the kind of XML you are dealing with (data or document).

      Hi, thanks for the excellent reply :)

      A few questions...

      You said DOM isn't appropriate for general purpose XML transformation - what if I'm just extracting data into a different structure, not necessarily translating it to XHTML or whatever? Also - the LibXML documentation says "This module is an interface to the gnome libxml2 DOM parser (no SAX parser support yet), and the DOM tree." So is it still acceptable in your opinion?

      One of the problems I've had in the past is extracting data from a doc with tag names that have identical names, example...

      <website> <name>Perlmonks</name> <rating>10/10</name> <people> <name>Anonymous Monk</name> </people> </website>

      How would I differentiate between the name inside the people tag and the website name? More of an XML question, but I'm also looking for a module that makes this really easy.

      Another thing I'd like to do easily: go through the XML file and pick out certain fields and compare them between multiple entries. For example, get the name and rating of each website so I can pick out everyone with a 10. This seems like it should be trivial (as it is with SQL) but the examples I've seen so far don't always seem so simple.

      Also - are there XML::Twig-liek interfaces for other languages? Thanks :)

        The DOM is still dangerous when extracting information,unless tou are very cautious. The main problem is with navigation methods, like getFirstChild: you just cannot use it without wrapping it into your own method. The first child of an element can be a lot of unexpected things: the line return after the element start tag, a comment, a processing instruction...and maybe even the next element. The addition of XPath in XML::LibXML makes it much safer by letting you do $elt->findnodes( 'people') which gives you the list of people elements child of $elt.

        As for differenciating between tags with the same name but in different contexts, XML modules will give you access to the context stack, so it will not be a problem. For example in Twig you can have handlers on website/name or on people/name, in XML::LibXML you would similarly use XPath to get the elements you want.

        In fact the XML equivalent of SQL is XPath (at least within a single document, XML Query deals with collections of documents). A nice resource for XML-related tutorials is, they have a good XPath tutorial.

        XML::Twig is purely perl. Note that if you don't want to use Perl you can always use XSLT, there are plenty of XSLT processors around, some of which can even be called from Perl.

        One last question, especially in light of a recent thread: it seems to me that you are dealing with data, and doing the kind of processing that a database doesvery well. So why are you using XML at all? Couldn't you just model your data into tables and use a DB? There are several portable alternatives that support the kind of processing you seem to be looking for.

        The LibXML documentation says "This module is an interface to the gnome libxml2 DOM parser (no SAX parser support yet), and the DOM tree." So is it still acceptable in your opinion?

        It doesn't matter whether anyone else finds this restriction acceptable or not. You need to determine whether this may cause problems for you or not and compare those problems to the benefits you gain from using this code. It depends on your situation.

        Any modern Unix should run libxml2 and some come with it installed or as part of their package system. If you run Windows, PPMs exist for ActivePerl. Windows binaries of libxml2 exist, if you use other versions of Perl on Windows.

        So, code that uses XML::LibXML should run on Unix, including Mac OS X, and Windows. If you need to port your code to other platforms, investigate each platform and see if XML::LibXML runs on it.

Re: (nrd) XML Module Recommendations
by newrisedesigns (Curate) on Jan 26, 2003 at 03:39 UTC

    Take a look at XML::SAX::PurePerl. It should be operating system independent, and uses SAX API, which I believe has some tie in with Java and other languages, but as I am also learning more about XML (and programming in general), I can't give you a definite answer.

    Best of luck to you.

    Oh, and while you're here, why doesn't you sign up for a PerlMonks account?

    John J Reiser

      I'm going to second this suggestion. I use SAX for most of the XML work I do in Perl because you can use any parsers that are registered without having to worry about which parsers implement what. All you do is pass your requirements for supported features (such as Namespaces) to ParserFactory and it creates an appropriate instance for you.

      I would also suggest a look at Perl and XML from O'Reilly. I've found it very useful.

      Grant me the wisdom to shut my mouth when I don't know what I'm talking about.

        I would also suggest a look at Perl and XML from O'Reilly. I've found it very useful.

        Bought it, read it, didn't find it helpful at all. I thought it covered things far too vaguely. Maybe I'll give it another look though.

Re: XML Module Recommendations
by Aristotle (Chancellor) on Jan 26, 2003 at 05:46 UTC
    I second mirod's suggestion to use XML::Twig. I was reluctant to do any XML mangling before I came across it. Tried to get my feet wet with the other modules, but even XML::Simple isn't. With Twig, I was up and running in 20 minutes for my XML first attempt which was a very simple task. Just it like it should be, the easy things easy and the hard things possible.

    Makeshifts last the longest.

Re: XML Module Recommendations
by bronto (Priest) on Jan 26, 2003 at 19:22 UTC
    I want to be able to access any part of the document at any given time.

    This seems to be the task for XPath; if you take a few time to learn the basics you could easily use XML::XPath, which should do exactly what you need


    # Another Perl edition of a song:
    # The End, by The Beatles
    END {
      $you->take($love) eq $you->make($love) ;

      You might want to benchmark it though, in my experience XML::XPath is pretty slow and might have a problem with a 10 MB file (see Ways to Rome for a simple benchmark of the various modules). matts might have a different opinion obviously ;--)

Re: XML Module Recommendations
by Anonymous Monk on Jan 26, 2003 at 02:41 UTC

    Forgot to mention - Operating system independence would be a big plus. I'd like to avoid relying on the gnome or other xml libraries. thanks.

      If platform independence is a priority, it might be an idea to stick to using PPM-enabled modules so Windoze users have an easier time installing them.

      I haven't used them myself, but these include XML-Element and XML-Parser.

      I don't believe there are any XML modules included in the standard distribution package, unfortunately.

      "Every program has at least one bug and can be shortened by at least one instruction -- from which, by induction, one can deduce that every program can be reduced to one instruction which doesn't work." -- (Author Unknown)

Re: XML Module Recommendations
by CountZero (Bishop) on Jan 26, 2003 at 10:22 UTC

    If you are thinking of using XML and XSLT in a web-server environment, perhaps you can check out AxKit and Sablotron. AxKit is build upon mod_perl in the Apache-webserver. Sablotron is written in C but has also a Perl-interface.

    I have used it to transform Perl-generated XML into HTML.


    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law