Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

RFC: XML::Pastor v0.52 is released - A revolutionary way to deal with XML

by aulusoy (Scribe)
on Jun 29, 2008 at 19:18 UTC ( [id://694627]=perlmeditation: print w/replies, xml ) Need Help??

Hello all,

Having just released the first available version (v0.52) of XML::Pastor, I have found this discussion list that might best suit speaking about it.

Now you don't need to write code in an ugly language like Java in order to be able to get native XML support. Because now, there is XML::Pastor.

XML::Pastor is a revolutionary new way to handle XML documents in Perl.

In fact, if you are familiar with Java's XML CASTOR, this module (XML::Pastor) will be very familiar to you. XML::Pastor is very similar to Java's Castor, however, as usual with Perl, it's more flexible. On the other hand, full XSD support is not achieved yet (a lot is already supported, see below).

XML::Pastor will actually generate Perl code starting with one or more W3C Schema(s) (XSD). The generated code is as easy, if not easier, to use as XML::Simple. YET (and this is the tricky part), you can also easily read and write to and from an XML document ('instance' of the schema) abiding by the rules of the schema. You can even validate against the original schema before writing the XML document.

However, you don't need the original schema at run-time (unless you are doing run-time code generation). Everything is translated into Perl at code generation time. Your generated classes 'know' about how to read, write, and validate XML data against the rules of the original schema without the actual schema at hand.

Attributes and child elements can be accessed through auto-generated accessors or hash items. You don't need to know or worry about whether or not a child element appears once or multiple times in the XML document. This is automagically taken care of for you. If you access the child element as an array (with a subscript), it is OK. But you don't need to. You might as well access the first such element directly, without needing to know that there are others.

Code can be generated at 'make' time onto disk (the so-called 'offline' mode) or it can be generated and 'eval'ed for you as a big chunk at run-time. Or you can get it as a string ready to be evaled. In 'offline' mode, you can choose to use a 'single' module style (where all code is in one big chunk), or in 'multiple' style, where each class is written to dik in a separate module.

There is also a command line utility, called 'pastorize' that helps you do this from within 'make' files.

Gone with the multiplicity problem of XML::Simple. Gone with the complexity of dealing with XML as XML. Now you can deal with XML data as real Perl objects (with accessors for child elements and attributes). But you can still get XML back out of it at the end of the day.

W3C SCHEMA SUPPORT

Most of the W3C XSD Schema structures are supported. Notable exception is substitution sets. Namespace support is currently a bit shaky (at most one namespace per generation). That's why schema 'import' is not - yet- supported. However, schema 'include' and 'replace' are supported.

All W3C schema builtin types have Perl counterparts with validation. This includes datetime and date types as well (You can get/set these with CPAN's DateTime objects as well).

Internally, XML::Pastor uses XML::LibXML to deal with actuial xml reading/writing XML (but not for validation). But, you don't need to know anything about XML::LibXML for being able to use XML::Pastor.

Note: It's already on CPAN. Just search for XML::Pastor on search.cpan.org.

Cheers,

Ayhan Ulusoy

  • Comment on RFC: XML::Pastor v0.52 is released - A revolutionary way to deal with XML

Replies are listed 'Best First'.
Re: RFC: XML::Pastor v0.52 is released - A REVOLUTIONARY way to deal with XML
by Tanktalus (Canon) on Jun 29, 2008 at 20:52 UTC

    First, I'm going to disagree with the anonymonk - it's not spam. It's entirely on-topic. If you don't want to read perl-related news, find another website.

    Second, what I'm used to seeing with new modules in existing spaces is a pro/con list. Can you compare your module to other XML parsing modules that already exist? In my case, I mostly want to know how it stacks up against XML::Twig, but comparisons with others are useful, too (XML::Simple, XML::Writer, just off the top of my head).

    Consider users that may be evaluating XML modules to use for their project as well as users that may already be using an XML module - why should they switch (or should they switch)?

    I must admit, there's nothing here that would make me give up XML::Twig - convince me. By doing a comparison. And if you give XML::Twig no "pros", I'm going to be very suspicious.

      Yes, I agree that a comparison chart would be useful. However, it will take some time to gather the information in an exhaustive way.

      I will try to be brief and to the point here at this time. But I will try to post a more comprehensive comparison chart sometime soon.

      PROS

      XML::Twig comparison

      XML::Twig is obviosuly an excellent module. The main difference with XML::Pastor is that, while working with XML::Twig, the code needs to know about the 'xml'ness of the data, whereas, while working with XML::Pastor, the program needs to know -almost- nothing about the xmlness of the data. For the user code, XML elements and attributes are just native Perl objects with accessors and/or regular hash and array access.

      Another difference with XML::Twig is the ability to validate against the original W3C Schema in XML::Pastor, and this without needing the schema at run time. By the way, schema validation at run time is stunningly fast, because there is no schema parsing necessary.

      XML::Simple comparison

      In many respects, XML::Pastor goes much with the philosophy of XML::Simple, with one important difference: XML::Pastor has full round-trip binding with XML. This means you can read and write XML with those native Perl objects. With XML::Simple you could do that, but if you have tried anything other than the most trivial, you know you really can't. Really, XML::Simple is useful for reading in a simple config file in xml maybe, but nothing more.

      Another comparison with XML::Simple is that, XML::Simple produces one big deep data structure upon parser the XML file. In contrast, XML::Pastor will produce native Perl Objects for each element and attribute with names and methods (accessors and much more) that correspond to the actual data. Nothing stops the monk from coding additional methods for those objects (using the same package names that resulted from code generation), hence building logic around the native object.

      Another drawback of XML::Simple is the multiplcity of child elements. In default mode XML::Simple will produce a hash for a single occurence of a given child node, but will produce an array of hashes if that node appears multiple times. This is very annoying. In another mode, it is possible to instruct XML::Simple to produce only arrays for child elements, but then the whole thing becomes quite convoluted. XML::Simple eleviates this problem by counting on the schema to produce the expected result. In reality, XML::Pastor will always prouduce a special kind of array for this situation => XML::Pastor::NodeArray , which has magical properties. If you access it like a hash or with a method call, it will pass it on to the first element. If you access it like an array, you can too. So, you never need to know.

      Another difference with XML::Simple is the XSD schema validation in XML::Pastor. You can't do that with XML::Simple.

      CONS

      Apart from some pending limitations, the two cons of XML::Pastor are:

    • XML::Pastor requires a W3C XSD Schema to work with.
    • XML::Pastor performs code generation, which could be considered clumsy by some. Note that this could be done at run-time, too, albeit paying a slight performance penalty at code start-up (the generated code will run equally fast, though.
    • Updated: XML::Pastor will slurp in the entire XML into memory, much like XML::Simple. This is probably not what you want for huge XML documents. You would be better of with XML::Twig or better yet SAX in that case.
    • Currently, XML::Pastor is only good for working with DATA style XML (without mixed mark up). This basically means that an element either contains only text or only child elements, not both mixed. So, XML::Pastor is not good for working with a document written in a markup language such as XHTML, for example.
    • Cheers,

      Ayhan (trinculo)

        Looks nice but, are you planning to support other XML schema languages like Schematron or Relax NG? I'm asking because XSD is so freaking ugly that I always put my digigloves on before touching it ;-)

        A side note for psini who considered the OP being a duplicate: It's not. The version in the code section comes with additional code examples.


        holli, /regexed monk/

        So it's similar to XML::Smart which also lets you access xml as overloaded-tied hash-array-scalars?

      First, I'm going to disagree with the anonymonk - it's not spam. It's entirely on-topic. If you don't want to read perl-related news, find another website.

      SPAM as a style of advertisement, the same kind employed here.

Re: RFC: XML::Pastor v0.52 is released - A revolutionary way to deal with XML
by sundialsvc4 (Abbot) on Jun 30, 2008 at 02:47 UTC

    Here's a little food-for-thought...

    What if I know absolutely nothing about “Java's XML CASTOR? :-/

    Nothing. Zero. Zip. Nada.

    Abruptly (and, perhaps, un-planned by you...) I therefore have absolutely zero idea why you say that this thing-of-yours is, and I quote, “a revolutionary(!!) new way to handle XML documents in Perl.”

    Now, mind you, I know absolutely nothing about Java's XML CASTOR, so... I'm actually not disputing your wisdom. I'm merely “off the bus.” I have no idea what you are talking about.

    So I flip over to a few things that I do know something about. “Sure, XML::Simple is ‘simple,’ as advertised, but how does your product stand-up against, for example, XML:Twig?

    But, y'know, I'm still “grabbin‘ at Twigs here,” because when all is said and done, I know nothing-at-all about what Java might be up to, and without that very-critical bit of information I simply do not have the basis to understand why ... with good reason, I am quite(!) sure ... you call this thing-o-yours “revolutionary.”

      I must admit you have a point there. I probably took a shortcut in order to get a point across in the least number of words. For those who do know about CASTOR, just a mere sentence would ring enough bells to get the meaning acroos. For those who don't, there is now enough information in this thread I think.

      'Revolutionary' deson't necessarily mean excellent or good. That's not for me to judge anyway. Revolutionary just means that there is an abrubt change in the way things work.

      In this respect, I tend to stick to my idea. The reason is that XML::Pastor introduces a whole new way of dealing with XML by generating native Perl classes starting from a W3C XSD schema. The resulting objects are even easier to manipulate than what results from XML::Simple on one hand. Furthermore, writing back to XML conformant to the original schema is taken care of.

      If you require more information, I would suggest that you check out the documentation of XML::Pastor or even download it and play with it.

      By the way, without being too critical, I would like to say that I try to keep myself up to date on what's going on out there even when it's not related to Perl. It doesn't mean I like Java per se, but it means I would like to be open-minded to new ideas.

        A curious parenthetical comment in that last paragraph... I doubt that anyone who has been in this business for any length of time whatever “knows only about Perl.” It seems rather odd even to suggest it. Bloop! Off my (duck's) back it went. Splash.

        One thing that, I think, keeps bumping against my head on this one is that you compare to XML::Simple ... which is, even on a good day, “just that.” If you are doing heavy-lifting XML work in Perl, you probably are not using Simple. How does that affect the self-assessment you present? Does it?

        The approach that you describe is what is sometimes called “pragmatic programming.” It's “machine generated software,” and it's not that new. It may well be the fact that you describe it in such glowing terms, and yet against one of the weakest XML-support libraries in CPAN (good though it is...), that makes me squint a little bit over my new bifocals “variable focus lenses.”

Re: RFC: XML::Pastor v0.52 is released - A revolutionary way to deal with XML
by Corion (Patriarch) on Jun 30, 2008 at 06:46 UTC

    I'm not familiar with Castor, but maybe you can compare your module to XML::Compile, which also generates Perl code to parse XML according to some schema. As far as I've understood, the main focus of XML::Compile is for parsing WSDL descriptions, but the only things I use are XML::LibXML and XML::Twig when the data too large for the DOM approach.

      That's a very valid concern ... how much memory does XML::Pastor use compared to XML::LibXML's DOM structure? Is there any way to work with XMLs too large to fit conveniently in memory?

      In either case I think it's a nice addition to Perl's toolset. And if I need to work with a complex XML and will get the XSD, I'll most likely consider the module.

        I agree. When working with large XML files, memory can become a concern.

        Right off, I must admit that XML::Pastor is not as savy as XML::Twig in this area. As you know, XML::Twig is capable of lazy/selective population of its structures, whereas XML::Pastor will first parse the entire XML into an XML::LibXML DOM tree and then convert the data into Perl native data objects (with hashes, arrays, scalars). On the other hand, the DOM is immediately thrown away so there is no double jeapordy.

        In terms of memory usage, XML::Pastor probably compares very much to XML::Simple, because the tree structures are similar except that in the case of XML::Pastor, the references are blessed objects.

        Compared to XML::LibXML DOM tree, XML::Pastor probably weighs less in memory as there is quite a bit of book-keeping in the DOM.

        One caveat though => currently XML::Pastor is only good for working with DATA style XML (without mixed mark up). This basically means that an element either contains only text or only child elements, not both mixed. So, XML::Pastor is not good for working with a markup language such as XHTML, for example.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://694627]
Approved by grep
Front-paged by almut
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others rifling through the Monastery: (2)
As of 2025-05-21 05:35 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?
    erzuuliAnonymous Monks are no longer allowed to use Super Search, due to an excessive use of this resource by robots.