Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?

Getting started with XML

by pileofrogs (Priest)
on Oct 21, 2009 at 20:04 UTC ( #802520=perlquestion: print w/replies, xml ) Need Help??
pileofrogs has asked for the wisdom of the Perl Monks concerning the following question:

Greetings monks, ye great and powerful...

Every now and again I have to deal with some XML (as much as I hate the stuff) and I pretty much always cludge it. Sometimes I'll use XML::Simple, or other times I'll just use some regexes. I think it's about time I learned how to handle it properly. I'll describe my present problem, but please keep in mind that I really want general XML guidance.

I'm working with Google's contacts API and when you edit a contact, you need to not only parse the blob of XML but you have to edit it and return the whole thing or the contact gets screwed up. IE, it overwrites the existing contact with the contents of the returned XML blob. So, I need to be able to read some XML, turn it into useful data structures and then turn it back into the same XML with a few values changed.(If you've never worked with XML, that's not easily done because XML does not map directly to data structures the way that, say JSON and YAML do. So, you can't convert an arbatrary data structure into XML and assume another XML implementation would produce the same XML.) XML does have things like a DTD (which I've never used).

To summarize, I need to be able to convert XML into data and back to XML preserving the XML format (not sure if format is the word here... layout?). I think DTD's might be involved, but I don't know how to use them. I'm working with a Google API which uses ATOM feeds, but I really want more general advice that will apply to XML that isn't a Google API ATOM feed. What i'd really like is recommendations of perl modules that can turn XML to data and back again without breaking the XML and links to web pages that tell me why it works the way it does.

Thank you all for wasting your time reading this and possibly even helping me.


Replies are listed 'Best First'.
Re: Getting started with XML
by GrandFather (Sage) on Oct 21, 2009 at 20:21 UTC

    XML::Twig - what more do you need to know? ;)

    True laziness is hard work
Re: Getting started with XML
by SilasTheMonk (Chaplain) on Oct 21, 2009 at 22:27 UTC
    If you are getting serious about things I think you should look at XML::LibXML. I believe it is built off a C library so it is a little faster if nothing else. There was another thread you may find useful: Question on XML::LibXML.... DTD's are the old (and still very prevalent way) of validating XML (the newer one being schemas). I don't see why should need to use either from what you have described. You may want to look in CPAN for the ATOM specific stuff. However when I tried (more for RSS than ATOM) that I ended up falling back onto XML::LibXML. Another approach might be to try converting to a perl structure that you feel familiar with via, say, XML::XML2JSON.
      I totally agree to use XML::LibXML and you should learn to use XPath which is a query language for the XML tree. XML Simple is really slow and it is way too easy to write unstable code when the structure of the XML changes.

        It's only really slow once the XML gets big and can't fit in memory.

        When the structure of the XML changes all bets are off. Each module and especially each way you use that module will allow for some changes to go unnoticed, some to break the script and some to cause incorrect results. Including of course XML::LibXML and XPath. There are types of changes that are more likely to force you to change something if you use one module and get away with old script if you use another, but if the XML changes you should ALWAYS review the change and your script and make sure it still works and still returns the right data.

        Enoch was right!
        Enjoy the last years of Rome.

Re: Getting started with XML
by Jenda (Abbot) on Oct 22, 2009 at 07:39 UTC

    To turn the XML into useful data structures (that is such DSs that are easy to use, not those that exactly match the layout of the XML giving you a hard to navigate maze of generic objects) you might use XML::Rules. Have a look at Simpler than XML::Simple for a comparison with XML::Simple.

    Turning that structure back into XML in the original format is a wee bit harder. Not sure what exactly do you need, but you might use the XML::Rules' filter mode and replace the (few) values you want to change by the filter, but it's not going to be very generic. I started work on some template based way to turn those tweaked data structures to XML in a specific format, but that's far from completion :-(

    If instead of a DTD Google provides a XML Schema, you may want to have a look at XML::Compile. That should be a way to do what you are after.

    Enoch was right!
    Enjoy the last years of Rome.

Re: Getting started with XML
by samwyse (Scribe) on Oct 21, 2009 at 20:43 UTC
    I generally use XML::Simple. It takes some work to get the options correct, but once you do it will produce XML that matches the input.
Re: Getting started with XML
by pajout (Curate) on Oct 22, 2009 at 12:22 UTC
    Perhaps XML::Trivial could help you, though it is read only representation of XML document. This module parses documents exactly, and then you can go through and output whatever you need.

Log In?

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://802520]
Approved by Corion
Front-paged by Arunbear
[james28909]: but then you have the others as well
[Lady_Aleena]: Renaming things like get_THAC0 to just THAC0 was easy. These are harder.
[james28909]: consolidate the three subs into one
[Lady_Aleena]: Um, what?
[james28909]: check is is data or hash or array and do tasks then return needed data
[Lady_Aleena]: james28909, you might want to look at the other two on my scratchpad.

How do I use this? | Other CB clients
Other Users?
Others perusing the Monastery: (5)
As of 2017-05-24 04:48 GMT
Find Nodes?
    Voting Booth?