Beefy Boxes and Bandwidth Generously Provided by pair Networks
P is for Practical
 
PerlMonks  

generating XML with data structures and Perl classes - the XML::Element::Tolol approach

by metaperl (Curate)
on Jul 11, 2011 at 15:56 UTC ( [id://913713]=perlmeditation: print w/replies, xml ) Need Help??

XML::Element::Tolol is a tool for compiling XML into the arrayref of arrayref (aka loltree) structure used by the new_from_lol method of HTML::Element. The main values of this tool:
  1. To create Perl classes for XML generation, thereby allowing the particular XML for various cases to use all possible means of extension and refinement available in Perl - sublcassing, method calls and data structures. (also see earlier node
  2. to build XML from a hashref for simple cases. yet still maintain XML order. The difficulties with more complex cases are also discussed

Related things

There are many complete formats for XML generation from data structures as seen here. In reflecting on that thread, I decided to work with the HTML::Element approach since I'm very familiar with it, it is robust, and has been in development for a long time.

XML::Simple

There are a few reasons not to use XML::Simple
  1. No support for XML generation requiring specific element sequencing
  2. No ability to generate mixed content
  3. The author himself says XML::Simple is only for certain cases of XML generation in XML::Simple::FAQ

XML::Toolkit

Another similar module is XML::Toolkit by perigrin. This compiles a single XML file to a series of Moose classes. I experimented with this module early on. Personally I think a series of methods in a single class might be more appropriate for programmatic control of a single XML file. perigrin somewhat agrees with me because we have both compared DBIx::Class, which is object-based, with all other ORMS which are limited by being class-based. (The ORM-talk is relevant because me and perigrin both agree that XML::Toolkit is the DBIx::Class::Loader of XML). Also, I found it to be quite verbose for even a simple XML example. Perhaps a compiler-compiler could have allowed for simpler usage. For instance to generate this XML:

<note> <to>Bob</to> <from>Alice</from> <heading>Secret</heading> <body>Shhh!</body> </note>
you need this XML::Toolkit:
my $document = MyApp::Note->new( to_collection => [MyApp::To->new(text => 'Bob')], from_collection => [MyApp::From->new(text => 'Alice')], headings => [MyApp::Heading->new(text => 'Secret' )], body_collection => [MyApp::Body->new(text=>'Shh!')], )
but only this much XML::Element::Tolol
my %data = ( to => 'Bob', from => 'Alice', heading => 'Secret', Body => 'Shhh!' ); MyApp::Note->new(data => \%data)->tree->as_XML;
In other words, one data definition, one constructor call and one method class versus no data definition, 5 constructor calls.

limitations / the future

attributes

The current compiler does have any support for regenerating XML attributes from the supplied hashref of data - only elements and content can be regenerated. This is unacceptable in general, but perfectly fine for my immediate need, which was to simplify XML generation for calling quickbooks. I've thought of a few ways of representing attributes, but am not sure which is best:

arrayref

One possibility is when the key of a hash entry is an arrayref, to use the first element as attributes and the second element and the content:
my %data = ( george => [ { age => 45} , 'some content' ] );

separate hashref for attributes

This is just brainstorming, so here's another idea. One hash for content another for attributes of the content:
my %data = ( george => 'some content' ] ); my %attr = ( george => {age 45 });
I think I like the former approach better.

iterated data

There is no support for automatically "unrolling" a section of XML which needs to be repeated. Right now, I'm using splice, Data::Rmap and List::MoreUtils to do arrayref mangling:
# call the superclass to render the simple data in the hashref my $lol = do { local $self->{data} = \%tmpdata; super(); }; # now rewrite part of the loltree with repetition my @newlol; for my $invoice_line ( @{$array_rows_for_single_invoice} ) { my $aref = [ InvoiceLineAdd => [ ItemRef => [ ListID => $invoice_line->{product_listid} + ] ], [ Amount => $invoice_line->{amount} ], ]; push @newlol, $aref; } my ($dump) = rmap_array { if ( $_->[0] eq 'InvoiceAdd' ) { use List::MoreUtils qw(first_index); my $i = first_index { ref $_ and $_->[0] eq 'SetCredit' } +@$_; splice @$_, ++$i, 1, @newlol; # No need to drill down any further cut($_); } else { $_; } } $lol;

schema by sample

Just like XML::Toolkit a complete sample XML file is required for compiling into a XML generator class. This is in contrast to XML::Compile by Mark Overmeer which uses XML schemas.


The mantra of every experienced web application developer is the same: thou shalt separate business logic from display. Ironically, almost all template engines allow violation of this separation principle, which is the very impetus for HTML template engine development.

-- Terence Parr, "Enforcing Strict Model View Separation in Template Engines"

  • Comment on generating XML with data structures and Perl classes - the XML::Element::Tolol approach
  • Select or Download Code

Replies are listed 'Best First'.
Re: generating XML with data structures and Perl classes - the XML::Element::Tolol approach
by Anonymous Monk on Jul 11, 2011 at 16:19 UTC

    Many of the issues you point out here with XML::Toolkit are a symptom of it's maturity and the conscious decision to punt on handling ambiguous cases. Take for example the collection interface you illustrate:

    my $document = MyApp::Note->new( to_collection => [MyApp::To->new(text => 'Bob')], from_collection => [MyApp::From->new(text => 'Alice')], headings => [MyApp::Heading->new(text => 'Secret' )], body_collection => [MyApp::Body->new(text=>'Shh!')], )

    XML::Toolkit has no way to know the number of to child elements, so it defaults to assuming "many". If you know that you're only ever going to have one to/from/heading/body then you can re-write the class to change these from ArrayRef types to Object types.

    The second object invocation can be eliminated by adding a coercion.

    coerce 'MyApp::To' => from 'Str' => via { MyApp::To->new(text => $_) }; coerce 'MyApp::From' => from 'Str' => via { MyApp::From->new(text => $_) }; coerce 'MyApp::Heading' => from 'Str' => via { MyApp::Heading->new(text => $_) }; coerce 'MyApp::Body' => from 'Str' => via { MyApp::Body->new(text => $_) };

    Combing both of these should get you identical syntax to your XML::Element::Tolol example. Unfortunately this is currently a manual post-processing of the files that XML::Toolkit generates. I could conceptually be able to build the coercions for Text nodes from Str programmatically during compilation and then generate a Type Library that is included in the generated classes. I simply have never had this need in the projects I've used XML::Toolkit on.

    Long term your compiler-compiler idea is similar to what I envision XML::Toolkit becoming. I think it should be possible to generate a Schema from XML::Toolkit generated Moose classes. This would allow you to bootstrap to get the kind of clarity you are wanting because you could generate an Schema from an example document, modify the schema to correctly reflect anything not properly reflected in the example document, then generate the final classes from the Schema.

    The problem is I don't have a billable project to implement this. Is this something people would be willing to fund a grant for?

      All that work and I forgot to be logged in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://913713]
Approved by GrandFather
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-19 22:30 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found