Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

Kindly suggest a good starting point for XML Parsing.

by perl514 (Pilgrim)
on Dec 29, 2011 at 11:40 UTC ( #945488=perlquestion: print w/ replies, xml ) Need Help??
perl514 has asked for the wisdom of the Perl Monks concerning the following question:

Venerated Monks,

I am looking for a website or a book that can get me started on parsing XML using Perl. What I want to do is, if I have an XML file that contains some information e.g., size of disks, type of RAID, etc, I need to parse that information so that a count of devices based on device size etc can be obtained. Kindly note that this is just one example of what I want to do. There are many other things that I want to achieve.

I will be searching on Google, but since this forum has come across as a place of genuine advice which is way way better than what I might find on Google, kindly let me know how I should go about this. I currently know basics of Perl. Should I get familiar with more advanced concepts and then try my hand at XML Parsing? Any book you would suggest? Please help me.

Perlpetually Indebted To PerlMonks

Comment on Kindly suggest a good starting point for XML Parsing.
Re: Kindly suggest a good starting point for XML Parsing.
by moritz (Cardinal) on Dec 29, 2011 at 11:58 UTC

    There are basically two approaches to parsing XML: event-based and DOM-tree.

    The first one reads an XML file, and calls user-defined subroutines whenever something interesting (opening or closing tags, attributes, text etc.). This has the advantage of not having to keep the whole XML file in the memory. A low-level event-based parser is XML::Parser, a more high-level is XML::Twig

    The second one constructs a document tree, where each tag is a node that can contain further tags, attributes and text. This one requires you to keep the whole document in memory, but makes some types of processing much easier. The most well-known and mature Perl module that does this type of processing is XML::LibXML (which allows querying through XPath), a newer one is Mojo::DOM, which can be queried with CSS selectors.

    Reading the documentation of these modules and maybe some examples here on perlmonks and on the Internet should give you a good idea what each are capable of.

      hi Moritz,

      Thank you sir.

       

      Perlpetually Indebted To PerlMonks

Re: Kindly suggest a good starting point for XML Parsing.
by mrguy123 (Hermit) on Dec 29, 2011 at 15:13 UTC
    All the modules in the above post are great, but if you need to parse a simple XML file then XML::Simple is a very simple and effective module to know.
    Regarding books, the O'Reilly books are usually pretty good
    Good Luck
    MrGuy

      For what it's worth, XML::Simple has “left me standing at the altar” a few more times than I personally care for.   XML-related tasks that start out as “simple” just don’t stay that way for long, and it is rather annoying to run into the limits of your tool before you run into the limits of your project.

        I agree with you that XML::Simple is sometimes a bit, um, simple, and it is highly recommended to learn the more powerful XML parsers mentioned above.
        Still, for those rare cases where a very easy to use and "simple" tool can be useful, its a nice thing to know.

        it is rather annoying to run into the limits of your tool before you run into the limits of your project.

        Yes it is. I'm currently in the planing phase on how to rewrite my big projects XML config parser. Which, incidently, is XML::Simple based. While it works quite nicely, it takes some, uh, not-so-nice workaround thinking when writing the config files itself. I must admit, when i started this project, it was the first time i used XML for configuration files. And it was Plug-and-Play, really and saved me quite some time to get started.

        Still, i'll probably keep XML::Simple around for all those small convert-this-into-that tools. For the typical ten-settings-and-five-comments config files for these kind of tools it is just ideal.

        So, XML::Simple is a two sided sword. It highly depends on your requirements.

        BREW /very/strong/coffee HTTP/1.1
        Host: goodmorning.example.com
        
        418 I'm a teapot

      See Simpler than XML::Simple for a discussion of some of the problems with XML::Simple and a solution.

      Jenda
      Enoch was right!
      Enjoy the last years of Rome.

Re: Kindly suggest a good starting point for XML Parsing.
by toolic (Chancellor) on Dec 29, 2011 at 15:24 UTC
    I recommend XML::Twig because it has good documentation (including a tutorial), there are many examples of usage here and at other forums, and the support is superb.

      Hi Toolic,

      Thank you for your reply. I did browse through the documentation. Currently the site is being updated but from what I read in the intro part, it states that understanding of the Object Oriented part of Perl will be needed. I just know some basic stuff using which I wrote a script 944083. Should dive straight off into XML Parsing or should I spend more time getting more familiar with some advanced concepts in Perl? Kindly let me know.

       

      Perlpetually Indebted To PerlMonks

        Just dive right in. Figure out the OO pieces as you go.
Re: Kindly suggest a good starting point for XML Parsing.
by CountZero (Bishop) on Dec 29, 2011 at 16:48 UTC
    To learn about XML and its associated technology, I find W3 Schools a nice point to start.

    A combination of XML, XPath and XSLT will probably already bring you far.

    Personally, I try to do as much "work" on XML-structures inside of the XML technology and only as a last resort apply Perl to it. Of course, Perl and XML work seamlessly together. So when you have extracted (XPath) the info from your XML-file and transformed it into another XML-format (through XSLT), that --if it is not already final-- should be easy to understand by your Perl-program for a final polishing.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      XSLT is an absolutely horrible programming experience. I tend to regard XSLT as a solution of last resort.

      It's right at the bottom of my list of solutions for programming problems, just below taking my own life.

        I think C and its minions are horrible programming languages, but that is just my feeling.

        Many years ago and long before Catalyst, Dancer or even Mason was even thought of, the first big web-project I did used Perl to extract data from a database and export it as an XML-file which was then transformed "server-side" into HTML through XSLT. XSLT was acting as a kind of proto-templating framework. If I remember well, it was early versions of AxKit and Sablotron all the way then and although the set-up was very difficult, once it ran inside the Apache-server it worked perfectly.

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: Kindly suggest a good starting point for XML Parsing.
by pileofrogs (Priest) on Dec 30, 2011 at 00:07 UTC

    This might be a moot point, but if possible you might want to consider moving away from XML to something like YAML or JSON. I hate XML and I know a lot of other people do too.

    Here's my problem with XML:
    <foo name=bobby><name>robert</name></foo></p>

    How do you represent that in data? How do you take the resulting data and make it turn back into XML that looks the same? I know you can, but there's no obvious 1 to 1 mapping. You basically have to make one up and you can avoid that whole problem with JSON or YAML.

    --Pileofrogs

      Can you be more specific? Are you talking about escaping control characters? Is your example correct?
      use XML::LibXML; $s = "&ltfoo name=bobby>&ltname&gtrobert</name></foo></p>"; print "before: $s.\n"; $dom = XML::LibXML->load_xml(string => "<root/>"); $dom->findnodes("/root")->[0]->appendText($s); print $dom->serialize; $s2 = $dom->findnodes("//text()")->[0]->data; print "after: $s2.\n";
      Output:
      before: &ltfoo name=bobby>&ltname&gtrobert</name></foo></p>. <?xml version="1.0"?> <root>&amp;ltfoo name=bobby&gt;&amp;ltname&amp;gtrobert&lt;/name&gt;&l +t;/foo&gt;&lt;/p&gt;</root> after: &ltfoo name=bobby>&ltname&gtrobert</name></foo></p>.

        I probably should have put that in code tags instead of using all those &lt, &gt. Sorry.

        <foo name=bobby><name>robert</name></foo>

        My point is, your foo has two names.

        <foo name=bobby><age>27</age></foo>

        Also sucks. It's no different from

        <foo age=27><name>bobby</name></foo>

        Obviously it's different, but not in a truly meaningful way. You could just as easily say

        <foo age=27 name=bobby></foo>

        or

        <foo><age>27</age><name>bobby</name></foo>

      Meh. How do you represent this structure in JSON?

      my $a = []; my $b = { a => $a }; push @$a, $b; print to_json($a); # ???

      Real life needs you to be able to deal with structures like that. XML and JSON are both poor serialisations for real life data. YAML is somewhat better in this regard but has its own shortcomings.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://945488]
Approved by moritz
Front-paged by moritz
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others drinking their drinks and smoking their pipes about the Monastery: (7)
As of 2014-07-14 08:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    When choosing user names for websites, I prefer to use:








    Results (257 votes), past polls