Beefy Boxes and Bandwidth Generously Provided by pair Networks
Problems? Is your data what you think it is?
 
PerlMonks  

A data structure for XML generation

by metaperl (Curate)
on Aug 11, 2009 at 12:55 UTC ( #787605=perlmeditation: print w/ replies, xml ) Need Help??

What do you think of this data structure for specifying how to produce XML:
[ tag_name, \%attr, $inner ]
where $inner is
  1. plain text - representing content
  2. an array ref - representing another tag(s)
This would not allow for mixed content. But for data-oriented XML production, that does not matter?

example

<family name="Kawasaki"> <father>Yasuhisa</father> <mother>Chizuko</mother> <children> <girl>Shiori</girl> <boy>Yusuke</boy> <boy>Kairi</boy> </children> </family>
[ family => { name => 'Kawasaki' }, [ [father => {} => 'Yasushisa' ], [mother => {} => 'Chizuko' ], [children [ [girl => {} => 'Shiori'], [boy => {} => 'Yasuke'], [boy => {} => 'Kairi'] ] ] ] ]

why another data to xml producer?

Here are the reasons I can think of:
  1. not easy/possible to specify when something is an attribute versus an element using XML::Simple. Ditto with XML::Smart
  2. cumbersome to specify exact ordering of XML with XML::TreePP
  3. data structure changes in XML::TreePP based on whether there are attributes or attributes and text as shown here

Comment on A data structure for XML generation
Select or Download Code
Re: A data structure for XML generation
by ctilmes (Priest) on Aug 11, 2009 at 13:16 UTC
    If you've defined two explicit cases for $inner (scalar or array ref), you can always differentiate it from a hash ref, so why not just make the \%attr optional and just omit the empty hashes {}?

    [ family => { name => 'Kawasaki' }, [ [father => 'Yasushisa' ], [mother => 'Chizuko' ], [children => [ [girl => 'Shiori'], [boy => 'Yasuke'], [boy => 'Kairi'] ] ] ] ]

    I also think your example had a couple minor errors (extra array ref and no punctuation between 'children' and its children.)

      If you've defined two explicit cases for $inner (scalar or array ref), you can always differentiate it from a hash ref, so why not just make the \%attr optional and just omit the empty hashes {}?
      Yes, I certainly thought of that. And you will often simply provide content without attributes, so I think that is a nice shorthand.
        You may use something like the lol structure of the method new_from_lol in HTML::Element, I think it is very compact and easy to handle.
Re: A data structure for XML generation
by BioLion (Curate) on Aug 11, 2009 at 16:10 UTC

    The way you have laid it out - With ctilmes' comments included - would seem to be an ideal case for a lightweight OO approach?

    Just a something something...
      The way you have laid it out - With ctilmes' comments included - would seem to be an ideal case for a lightweight OO approach?
      Right now, it is using the OO heavyweight - Moose. (grin).

      The first few test cases pass in my git repo for the source More late-breaking hacker news as it surfaces!

Re: A data structure for XML generation
by snoopy (Deacon) on Aug 11, 2009 at 23:48 UTC
    Also note XML::Ximple data representations. An example from the source code:
    $ximple_tree = { tag_name => "unicycle" attrib => { color => "chrom", height => 3, brand => "foo"} content => ["The content of a ximple tree\n", "is heterogenious just like xml", "itself. For example this is how\n", "i would make the word", { tag_name=>"bold", attrib=>{}, content=>["cheese"] }, "appear in a bold tag"] };
Re: A data structure for XML generation
by dolmen (Sexton) on Aug 12, 2009 at 12:18 UTC
    See also Template::Declare. Here is an example (which with T::D 0.40 requires a fix for bug #48642, so put T::D::TS::Family in a separate .pm file):
    { package Template::Declare::TagSet::Family; use base 'Template::Declare::TagSet'; sub get_tag_list { return [qw(family father mother children girl boy)] } } { package Family::Templates; use base 'Template::Declare'; use Template::Declare::Tags 'Family'; template Kawasaki => sub { family { attr { name => 'Kawasaki' } father { 'Yasushisa' } mother { 'Chizuko' } children { girl { 'Shiori' } boy { 'Yasuka' } boy { 'Kairi' } } } }; } use Template::Declare; Template::Declare->init(roots => ['Family::Templates']); print Template::Declare->show('Kawasaki');
Re: A data structure for XML generation
by ctilmes (Priest) on Aug 12, 2009 at 17:45 UTC
    It would complicate your parsing a bit, but I think you could have an even simpler format by omitting an extra layer of array for sibling nodes (just use them for children). The structure translation would still be unambiguous.

    [ family => { name => 'Kawasaki' }, [ father => 'Yasushisa' , mother => { hair => 'short'} => 'Chizuko' , children => [ girl => 'Shiori', boy => { hair => 'black' } => 'Yasuke', boy => 'Kairi' ] ] ]
    <family name="Kawasaki"> <father>Yasuhisa</father> <mother hair="short">Chizuko</mother> <children> <girl>Shiori</girl> <boy hair="black">Yusuke</boy> <boy>Kairi</boy> </children> </family>
Re: A data structure for XML generation
by holli (Monsignor) on Aug 12, 2009 at 17:59 UTC
    I have learned very soon that all these nifty XML writing modules are not worth the bytes needed to store them when it comes to anything but writing trivial XML like, for example, XML-FO. See, the M in XML means "markup". What do we use to create markup? Yes. Templates. Go with TT or similar and be done for all your needs.

    Note that I'm not talking about parsing.


    holli

    You can lead your users to water, but alas, you cannot drown them.
Re: A data structure for XML generation
by ELISHEVA (Prior) on Aug 12, 2009 at 23:24 UTC

    If you require each array to have the following format you can complete eliminate the array around the child tags. The position alone is enough to determine whether or not an array element is a nested tag. Eliminating the extra square brackets around each tag set would help keep the indentation from going halfway across the page or more.

    [0] tag name [1] attribute hash reference [2] data [3] ...: one array ref for each nested tag

    Your sample data would then look like this:

    [ family => { name => 'Kawasaki' } => undef, [father => {} => 'Yasushisa' ], [mother => {} => 'Chizuko' ], [children => {} => undef, [girl => {} => 'Shiori'], [boy => {} => 'Yasuke'], [boy => {} => 'Kairi'] ] ]

    Best, beth

      The attribute hash ref can also be optional: if ref($tag->[1]) ne 'HASH', the tag has no attributes and $tag->[1] is the first child node. So position is only relative and undefs are not needed.
      [ family => { name => 'Kawasaki' }, [father => 'Yasushisa' ], [mother => 'Chizuko' ], [children => [girl => 'Shiori' ], [boy => 'Yasuke' ], [boy => 'Kairi' ] ] ]
Re: A data structure for XML generation
by grantm (Parson) on Aug 14, 2009 at 00:14 UTC
    not easy/possible to specify when something is an attribute versus an element using XML::Simple

    Actually XML::Simple has a very simple rule for determining whether to output something as an attribute vs a nested element. If the value is a plain scalar, it will be output as an element, otherwise it will be a nested element. E.g.:

    print XMLout({person => { id => 123, name => ['John Doe']}})

    gives:

    <opt> <person id="123"> <name>John Doe</name> </person> </opt>

    Having said that, I'd encourage the use of a different module for anything non-trivial.

      Actually XML::Simple has a very simple rule for determining whether to output something as an attribute vs a nested element. If the value is a plain scalar, it will be output as an element, otherwise it will be a nested element.
      Indeed... Just FYI, a co-worker and myself were evaluating XML::Simple for producing XML and he had the impression that there was no way to do this. If you glance at the SYNOPSIS, and see this:
      'gobi' => { 'osversion' => '6.5', 'osname' => 'irix', 'address' => '10.0.0.102' },
      you get the idea that there is no way to force address to be an element somehow.

      I knew about ForceArray but for some reason, things didn't click for me until your post here.

      regexps as "non-plain scalars"

      I wanted to ease my typing. The standard "non-plain scalar" is this:
      mother => ['Mary']
      but I was hoping to get away with this:
      mother => qr/Mary/
      because it is easier to type and more readable (IMHO).

      Any chance of converting a node whose value is a regexp into an element as opposed to an attribute?

      Order of sibling XML elements

      Now, one other question, we are producing XML based on an XSD. I am wondering if the XML has order requirements on siblings. In other words, I did the sample XML for the original topic in this thread in XML::Simple -
      use strict; use warnings; use XML::Simple; use Tie::IxHash; tie (my %struct, 'Tie::IxHash', family => { name => 'Kawasaki', father => ['Yasuhisa'], mother => ['Chizuko'], children => { girl => ['Shiori'], boy => ['Kairi'], } } ); warn XMLout(\%struct);
      but while each level of the XML tree was accurate, the sibling elements father, mother, children were in a different order than I wanted them to be.
        The short answer is that XML::Simple is absolutely not the right module for generating XML. I don't generate XML very often and when I do I tend to use a templating tool like TT or Mason.
Re: A data structure for XML generation
by moritz (Cardinal) on Aug 17, 2009 at 06:50 UTC
    Over the weekend I used Carl Mäsak's SVG module (Perl 6), which -- despite its name -- is general XML serializer.

    It uses Pair objects for both attributes and child tags, distinguished by the type of the value: if the value is an List, it's used as a child tag, if it's a scalar it's used as an attribute.

    The documentation uses this example:

    my $svg = :svg[ # root tag :width(200), # attribute :height(200), # circle => [ # child tag :cx(100), :cy(100), :r(50) # child tag attributes ], text => [ # child tag :x(10), :y(20), "hello" # child tag attributes ] ];

    I used this data structure to generate bar charts in SVG, and found it very handy.

Re: A data structure for XML generation
by Jenda (Abbot) on Aug 18, 2009 at 16:53 UTC

    Mkay,

    use XML::Rules; # at least 1.08 my $parser = XML::Rules->new(rules => []); print $parser->ToXML( [ family => { name => 'Kawasaki' }, [ [father => 'Yasushisa' ], [mother => 'Chizuko' ], [children => [ [girl => 'Shiori'], [boy => 'Yasuke'], [boy => 'Kairi'] ] ] ] ], 0, ' ', '');

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.

      This doesn't work for me. When I run it I get

      <ARRAY(0x84b3a0)>0

        What version of XML::Rules do you have? Are you sure you have 1.08?

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlmeditation [id://787605]
Approved by ww
Front-paged by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (5)
As of 2014-07-25 12:01 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    My favorite superfluous repetitious redundant duplicative phrase is:









    Results (170 votes), past polls